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CYTOCHROME P450 MONOOXYGENASE AND 
5 NADPH CYTOCHROME P450OXIDOREDUCrASE GENES AND 

PROTEINS RELATED TO THE OMEGA HYDROXYLASE COMPLEX OF 
CANDIDA TROPICALIS AND METHODS RELATING THERETO 

CROSS REFERENCE TO RELATED APPLICATIONS 
10 This application claims priority to U.S. Provisional Application Serial No. 

60/103,099 filed October 5, 1998, and U.S. Provisional Application Serial No. 60/083,798 filed 
May 1, 1998. 

BACKGROUND 

IS L Fj^ld of the Invention 

The present invention relates to novel genes which encode ensues of the Ca>- 
hydroxylase complex in yeast Candida tropicalis strains. In particular, the invention relates to 
novel genes encoding the cytochrome P4S0 and NADPH reductase mgrmes of the (■>- 
hydroxylase complex in yeast Candida tropicalis j and to a method of quantitating the expression 
20 of genes. 

2. Description of the Related Art 

Aliphatic dioic acids are versatile chemical intermediates useful as raw 
materials for the preparation of perfumes, polymers, adhesives and macrolid antibiotics. While 

25 several chemical routes to the synthesis of long-chain alpha, (i>-dicarboxy lie acids are available, 
the synthesis is not easy and most methods result in mixtures containing shorter chain lengths. 
As a result, extensive purification steps are necessary. While it is known that long-chain dioic 
acids can also be produced by microbial transformation of alkanes, fiitty acids or esters thereof, 
chemical synthesis has renuuned the most commercially viable route, due to limitations with the 

30 current biological approaches. 

Several strains of yeast are known to excrete alpha, co-dicarboxylic acids as a 
byproduct v^en cultured on alkanes or &tty acids as the carbon source. In particular, yeast 
belonging to the Genus Candida, such as C. albicans, C cloacae, C. guillermondii, C, 
intermedia, C. lipolytica, C maltosa, C parapsilosis and C zeylenoides are known to produce 
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such dicarboxylic acids (Agr. Biol Chem 35: 2033-2042 (1971)). Also, various strains of C 
iropicalis are known to produce dicarboxylic acids ranging in chain lengths from C|, through C|g 
(Okino et al., BM Lawrence, BD Mookheijee and BJ Willis (eds), in Flavors and Fragrances: A 
World Perspective. Proceedings of the 10*** International Conference of Essential Oils, Flavors 
5 and Fragrances, Elsevier Science Publishers BV Amsterdam (1988)), and are the basis of sev«al 
patents as reviewed by Bilhler and Schindler, in Aliphatic Hydrocarbons in Biotechnology^ H. J. 
Rehm and G. Reed (eds). Vol. 169, Verlag Chemie, Weinheim (1984). 

Studies of the biochemical processes by which yeasts metabolize alkanes and fiettty 
acids have revealed three types of oxidation reactions: a-oxidation of alkanes to alcohols, co- 

10 oxidation of &tty acids to alpha, co-dicarboxylic acids and the degradadve ;9-oxidation of fetty 
acids to CO2 and water. The first two types of oxidations are catalyzed by microsomal enzymes 
vMic the last type takes place in the peroxisomes. In C. tropicalis, the first step in the o)- 
oxidation pathway is catalyzed by a membrane-bound enzyme complex (o-hydroxylase 
complex) including a cytochrome P4S0 monooxygenase and a NADPH cytochrome reductase. 

15 This hydroxylase complex is responsible for the primary oxidation of the terminal methyl group 
in alkanes and fatty acids (Gilewicz et al.. Can, J. Microbiol. 25:201 (1979)). The genes which 
encode the cytochrome P450 and NADPH reductase components of the complex have previously 
been identified as P450ALK and P450RED respectively, and have also been cloned and 
sequenced (Sanglard et al.. Gene 76:121*136 (1989)). P4S0ALK has also been designated 

20 P450ALK1 . More recently, ALK genes have been designated by the symbol CYP and RED 
genes have been designated by the symbol CPR. See, e.g.. Nelson, Pharmacogenetics 6(l):l-42 
(1996), which is incorporated herein by reference. See also Ohkuma et al., DMA and Cell 
Biology U:\63'173 (1995), Seghezzietal., DNA and Cell Biology, 11:767-780 (1992) and 
Kargel et al., Yecat 12:333-348 (1996), each incorporated herein by reference. For example, 

25 P450ALK is also designated CYP52 according to tilie nomenclature of Nelson, sipra. Fatty acids 
are ultimately formed fit>m alkanes after two additional oxidation steps, catalyzed by alcohol 
oxidase (Kemp et 2X.yAppl Microbiol and BiotechnoL 28: 370-374 (1988)) and aldehyde 
dehydrogenase. The &tty acids can be fiirther oxidized through the same or similar pathway to 
the corresponding dicarboxylic acid. The o>-oxidation of fatty acids proceeds via the (O-hydroxy 

30 fatty acid and its aldehyde derivative, to the corresponding dicarboxylic acid without the 
requirement for CoA activation. However, both fatty acids and dicarboxylic acids can be 
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degraded, after activation to the corresponding ac^l-CoA ester through the P-oxidation pathway 
in the peroxisomes, leading to chain shortening. In mammalian systems, both fiatty acid and 
dicarboxylic acid products of o-oxidation are activated to their CoA-esters at equal rates and are 
substrates for both mitochondrial and peroxisomal P-oxidation (/. Biochenty 102:225-234 
S (1987)). In yeast, p-oxidation takes place solely in the peroxisomes {Agr.BioLChem. 49: 1 821* 
1828 (1985)). 

The production of dicarboxylic acids by fermentation of unsaturated C,4-Ci5 
monocarboxylic acids using a strain of the species C. tropiccdis is disclosed in U.S. Patent 
4,474,882. The unsaturated dicarboxylic acids correspond to the starting materials in the number 
10 and position of the double bonds. Similar processes m which other special microorganisms are 
used are described in U.S. Patmts 3,975,234 and 4,339,536, in British Patent Specification 
1,405,026 and in German Patent Publications 21 64 626, 28 53 847, 29 37 292, 29 51 177, and 
21 40 133. 

Cytochromes P450 (P450s) are terminal monooxidases of a 
15 multicomponent exaym& system as described above. They comprise a superfamily of proteins 
which exist widely in nature having been isolated from a variety of organisms as described e.g., 
in Nelson, supra. These organisms include various mammals, fish, invertebrates, plants, 
mollusk, crustaceans, lower eukaryotes and bacteria (Nelson, supra). First discovered in rodent 
liver microsomes as a carbon-monoxide binding pigmrat as described, e.g., in Garfinkel, Arch 
20 Biochem. Biophys, 77:493^509 (1958), which is incorporated herein by reference, P450s were 
later named based on their 

absorption at 450 mn in a reduced-CO coupled difference spectrum as described, e.g., in Omura 
et al., J. Biol Chem. 239:2370-2378 (1 964), vdiich is mcorporated h^m by reference. 

P450s catalyze the metabolism of a variety of endogenous and exogenous 

25 compounds (Nelson, stqn^a). Endogenous compounds include steroids, prostanoids, eicosanoids, 
fat-soluble vitamins, &tty acids, mammalian alkaloids, leukotrines, biogenic amines and 
phytolexins (Nelson, stq^ra). P450 metabolism involves such reactions as epoxidation, 
hydroxylation, deakylation, N-hydroxylation, sulfoxidation, desulfiiration and reductive 
dehalogenation. These reactions generally make the compound more water soluble, which is 

30 conducive for excretion, and more electrophilic. These electrophilic products can have 

detrimental effects if they react with DNA or other cellular constituents. However, they can react 
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through conjugation with low molecular weight hydrophilic substances resulting in 
glucoronidation, sulfidon, acetylation, amino acid conjugation or glutathione conjugation 
typically leading to inactivation and elimination as described, e.g.> in Klaassen et al.. Toxicology^ 
3"* ed, Macmillan, New York, 1986, incorporated herein by reference. 
5 P4S0s are heme thiolate proteins consisting of a heme moiety bound to a single 

polypeptide chain of 45,000 to 55,000 Da. The iron of the heme prosthetic group is located at 
the center of a protoporphyrin ring. Four ligands of the heme iron can be attributed to the 
porphyrin ring. The fifth ligand is a thiolate anion from a cysteinyl residue of the polypeptide. 
The sixth ligand is probably a hydroxyl group from an amino acid residue, or a moiety with a 

10 similar field strength such as a water molecule as described, e.g., in Goeptar et al.. Critical 
Reviews in Toxicology 25(l):25-65 (1995), incorporated herein by reference. 

Monooxygenadon reactions catalyzed by cytochromes P450 in a eukaryotic 
membrane-bound system require the transfer of electrons from NADPH to P4S0 via NADPH- 
cytodirome P450 reductase (CPR) as described, e.g., in Taniguchi et al., Arcfi Biochem. 

IS Biophys. 232:585 (1984), incorporated herein by reference. CPR is a flavoprotein of 

approximately 78,000 Da containing 1 mol of flavin adenine dinucleotide (FAD) and I mol of 
flavin mononucleotide (FMN) per mole of enzyme as described, e.g., in Potter et al., J. Biol 
Chem. 258:6906 (1983), incorporated herein by reference. The FAD moiety of CPR is the site of 
electron entry into the enqrme, whereas FMN is the electron-donating site to P450 as described, 

20 e.g., in Vermilion et al., J. Biol Chem, 253:8812 (1978), incorporated herein by reference. The 
overall reaction is as follows: 

H* + RH + NADPH + 02 -ROH + NADr + HjO 

25 Binding of a substrate to the catalytic site of P450 apparently results in a 

conformational change initiating electron transfer from CPR to P450. Subsequent to the transfer 
of the first election, binds to the Fe2*-P450 substrate complex to form Fej^ -P450-substrate 
complex. This complex is then reduced by a second electron ftom CPR^ or, in some cases, 
NADH via cytochrome b5 and NADH-cytochrome b5 reductase as described, e.g., in Ouengerich 

30 et al., Arck Biochem, Biophys. 205:365 (1980), incorporated herein by reference. One atom of 
this reactive oxygen is introduced into the substrate, while the other is reduced to water. The 
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oxygenated substrate then dissociates, regenerating the oxidized form of the cytochrome P4S0 as 
described, e.g., in Klassen, Amdur and Doull, Casarett andDoulVs Toxicology, Macmilian, New 
Yoric (1986), incorporated herein by reference. 

The P4S0 reaction cycle can be short-circuited in such a way that O2 is reduced to 
5 O2' and/or HjOj instead of bemg utilized for substrate oxygenation. This side reaction is often 
referred to as the "uncoupling" of cytochrome P450 as described, e.g., in Kuthen et al., Eur. J. 
Biochem, 126:583 (1982) and Poulos et al., FASEB J. 6:674 (1992), both of which are 
incorporated herein by reference. The formation of these oxygen radicals may lead to oxidative 
cell damage as described, e.g., in Mukhopadhyay, /. BioL Chenu 269(18):13390-13397 (1994) 
10 and Ross et al., Biochem. Pharm. 49(7):979-989 (1995), both of which are incorporated herein 
by re£»ence. It has been proposed that cytochrome b5's effect on P450 binding to the CPR 
results m a more stable complex which is less likely to become "imcoupled" as described, e.g., in 
Yamazaki et al., Arck Biochem. Biophys. 325(2):174-182 (1996), incorporated herein by 
reference. 

15 P450 families are assigned based upon protein sequence comparisons. 

Notwithstanding a certain amount of heterogeneity, a practical classification of P450s into 
families can be obtained based on deduced amino acid sequence similarity. P450s with amino 
acid sequence similarity of between about 40 - 80% are considered to be in the same family, with 
sequences of about > 55% belonging to the same subfamily. Those with sequence similarity of 

20 aboiit < 40% are generally listed as members of different P450 gene families (Nelson, supra). A 
value of about > 97% is taken to indicate allelic variants of the same gene, unless proven 
oAerwise based on catalytic activity, sequence divergence in non-translated regions of the gene 
sequence, or chromosomal mapping. 

The most highly conserved region is the HR2 consensus containing the invariant 

25 cysteine residue near the carboxyl terminus which is required for heme binding as described, e.g., 
in Gotoh et al. J. Biochem. 93:807-817 (1983) and Motohashi et al., J. Biochem. 101 :879-997 
(1987), both of i^ch are incorporated herein by reference. Additional consensus regions, 
including the central region of helix I and the transmembrane region, have also been identified, 
as described^ e.g, in Goeptar et al., supra and Kalb et al., PNAS. 85:7221-7225 (1988), 

30 incorporated herein by reference, although the HR2 cysteine is the only invariant amino acid 
among P450s. 
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Short chain (^C12) aliphatic dicarboxylic acids (diacids) are important mdustriai 
intennediates in the manu&cture of diesters and polymers, and find application as 
thermoplastics, plasticizing agents, lubricants, hydraulic fluids, agricultural chemicals, 
pharmaceuticals, dyes, sur&ctants, and adhesives. The high price and limited availability of 
5 short chain diacids are due to constraints imposed by the existing chemical synthesis. 

Long-chain diacids (aliphatic a, o-dicarboxyiic acids with carbon numbers of 12 
or greater, hereafter also referred to as diacids) (H00C-(CH2)o-COOH) arc a versatile family of 
chemicals with demonstrated and potential utility in a variety of chemical products including 
plastics, adhesives, and firagrances. Unfortunately, the full market potential of diacids has not 

10 been realized because chemical processes produce only a limited range of these materials at a 
relatively high price. In addition, chemical processes for the production of diacids have a 
number of limitations and disadvantages. All the chemical processes are restricted to the 
production of diacids of specific carbon chain lengths. For example, the dodecanedioic acid 
process starts vnih butadiene. The resulting product diacids are limited to multiples of four- 

15 carbon lengths and, in practice, only dodecanedioic acid is made. The dodecanedioic process is 
based on nonrenewable petrochemical feedstocks. The multireaction conversion process 
produces unwanted byproducts, which result in yield losses, NOx pollution and heavy metal 
wastes. 

Long-chain diacids offer potential advantages over shorter chain diacids, but their 
20 high selling price and limited commercial availability prevent widespread growth in many of 
these applications. Biocatalysis offers an innovative way to overcome these limitations with a 
process that produces a wide range of diacid products fix>m renewable feedstocks. However, 
there is no commercially viable bioprocess to produce long chain diacids fit)m renewable 
resources. 

25 

SUMMARY OF THE INVENTION 
An isolated nucleic acid is provided vAdch encodes a CPRA protein having the 
amino acid sequence set forth in SEQ ID NO: 83. An isolated nucleic acid is also provided 
vMch includes a coding region defined by nucleotides 1006-3042 as set forth in SEQ ID NO: 81 . 
30 An isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID 
NO: 83. A vector is provided which includes a nucleotide sequence encoding CPRA protein 
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including an amino acid sequence as set forth in SEQ ID NO: 83. A host ceU is provided which 
is transfected or transformed with the nucleic acid encoding CPRA protein having an amino acid 
sequence as set forth in SEQ ID NO: 83. A method of producing a CPRA protein including an 
amino acid sequence as set forth in SEQ ID NO: 83 is also provided which includes a) 
S transforming a suitable host cell with a DNA sequence that encodes the protein having the amino 
acid sequence as set forth in SEQ ID NO: 83; and b) culturing the cell under conditions &voring 
the ^ression of the protein. 

An isolated nucleic acid is provided ^ch encodes a CPRB protein having the 
amino acid sequence set forth in SEQ ID NO: 84. An isolated nucleic acid is provided which 

10 includes a coding region defined by nucleotides 1033-3069 as set forth in SEQ ID NO: 82. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
84. A vector is provided v^ch includes a nucleotide sequence encoding CPiEB protein 
includmg an amino acid sequence as set forth in SEQ ID NO: 84. A host cell is provided ^ch 
is transfected or transformed with the nucleic acid encoding CPRB protein having an amino acid 

IS sequence as set fordi in SEQ ID NO: 84. A method of producing a CPRB protein including an 
amino acid sequmce as set forth in SEQ ID NO: 84 is provided i^ch includes a) transforming a 
suitable host cell with a DNA sequence that encodes the protein having the amino acid sequence 
as set forth in SEQ ID NO: 84; and b) culturing the cell under conditions favoring the expression 
of the protein. 

20 An isolated nucleic acid is provided which encodes a CYP52A1A protein having 

the amino acid sequence set forth in SEQ ID NO: 95. An isolated nucleic acid is provided 
which includes a coding region defined by nucleotides 1 177-2748 as set forth in SEQ ID NO: 85. 
An isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID 
NO: 95. A vector is provided which includes a nucleotide sequence encoding CYP52A1A protein 

25 including an amino acid sequence as set forth in SEQ ID NO: 95. A host cell is provided which 
is transfected or transfomied with the nucleic acid encoding CYP52A1A protein having an amino 
acid sequence as set forth in SEQ ID NO: 95. A method of producing a CYP52A1A protem 
inclv^<"e an ammn acid sequence as set forth in SEQ ID NO: 95 is provided i^ch includes a) 
transforming a suitable host cell with a DNA sequence that encodes the protein having the amino 

30 acid sequence as set forth in SEQ ID NO: 95; and b) culturing the cell under conditions fiivoring 
the e?q>ression of the protein. 
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An isolated nucleic acid encoding a CYP52A2A protein is provided vdiich has the 
amino acid sequence set forth in SEQ ID NO: 96. An isolated nucleic acid is provided which 
includes a coding region defined by nucleotides 1 199-2767 as set forth in SEQ ID NO: 86. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
5 96. A vector is provided which includes a nucleotide sequence encoding CKP5ii42i4 protein 
including an amino acid sequence as set forth in SEQ ID NO: 96. A host cell is provided which 
is transfected or transformed with the nucleic acid encoding CYP52A2A protein having an amino 
acid sequence as set forth in SEQ ID NO: 96. A method of producing a CYP52A2A protein 
including an amino acid sequence as set forth in SEQ ID NO: 96 is provided which includes a) 

10 transforming a suitable host cell with a DNA sequence that encodes the protein having the amino 
acid sequence as set forth in SEQ ID NO: 96; and b) culturing the cell under conditions favoring 
the expression of the protein* 

An isolated nucleic acid encoding a CYP52A2B protein is provided which has the 
amino acid sequence set forth in SEQ ID NO: 97. An isolated nucleic acid is provided yMch 

IS includes a coding region defined by nucleotides 1072-2640 as set forth in SEQ ID NO: 87. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
97. A vector is provided which includes a nucleotide sequence encoding CZP52i42^ protein 
including an amino acid sequence as set forth in SEQ ID NO: 97. A host cell is provided which 
is transfected or transformed with the nucleic acid encoding CYP52A2B protein having an amino 

20 acid sequence as set forth in SEQ ID NO: 97. A mediod of producing a CYP52A2B protein 
including an amino acid sequence as set forth in SEQ ID NO: 97 is provided which includes a) 
transforming a suitable host cell with a DNA sequence that encodes the protein having the amino 
acid sequence as set forth in SEQ ID NO: 97; and b) culturing the cell under conditions favoring 
the expression of the protein. 

25 An isolated nucleic acid encoding a CYP52A3A protein is provided which has 

the amino acid sequence set forth in SEQ ID NO: 98. An isolated nucleic acid is provided 
which includes a coding region defined by nucleotides 1 126-2748 as set forth in SEQ ID NO: 88. 
An isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID 
NO: 98. A vector is provided which includes a nucleotide sequence encoding CYP52A3A 

30 protein including an amino acid sequence as set forth in SEQ ID NO: 98. A host cell is provided 
which is transfected or transformed with the nucleic acid encoding CYP52A3A protein having an 
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amino acid sequence as set forth in SEQ ID NO: 98. A method of producing a CYP52A3A 
protein including an amino acid sequence as set forth in SEQ ID NO: 98 is provided which 
includes a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 98; and b) culturing the cell under 
5 conditions fevoring the expression of the protein. 

An isolated nucleic acid encoding a CyP52^ protein is provided having the 
amino acid sequence as set forth in SEQ ID NO: 99. An isolated nucleic acid is provided which 
mcludes a coding region defined by nucleotides 913-2535 as set forth in SEQ ID NO: 89. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 

10 99. A vector is provided M^iich includes a nucleotide sequence encoding CYP52A3B protein 
mcluding an amino acid sequence as set forth in SEQ ID NO: 99. A host cell is provided which 
is transfected or transformed with the nucleic acid encoding CYP52A3B protein having an amino 
acid sequence as set forth m SEQ ID NO: 99. A method of producing a CYP52A3B pxotein 
including an amino acid sequence as set forth in SEQ ID NO: 99 is provided >^ch includes a) 

15 transfonning a suitable host cell with a DNA sequence that encodes the protein having the amino 
acid sequence as set forth in SEQ ID NO: 99; and b) culturing the cell under conditions fiivoring 
the expression of the protein. 

An isolated nucleic acid encodmg a ClTJZlJij protein is provided having the 
amino acid sequence set forth in SEQ ID NO: 100. An isolated nucleic acid is provided which 

20 includes a coding region defined by nucleotides 1 103-2656 as set forth in SEQ ID NO: 90. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
100. A vector is provided which includes a nucleotide sequence encoding 0752/45^ protein 
including an amino acid sequence as set forth in SEQ ID NO: 100. A host cell is provided 
which is transfected or transformed with the nucleic acid encoding CYP52A5A protein having an 

25 amino acid sequence as set forth in SEQ ID NO: 100. A method of producing a CYP52A5A 
protein including an amino acid sequence as set forth in SEQ ID NO: 100 is provided which 
includes a) transfonning a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 100; and b) culturing the cell under 
conditions favoring the expression of the protein. 

30 An isolated nucleic acid encoding a CITiZ^J^ protein is provided having the 

amino acid sequence as set forth in SEQ ID NO: 101. An isolated nucleic acid is provided 
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vAddi includes a coding region defined by nucleotides 1 142-2695 as set forth in SEQ ID NO: 91. 
An isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID 
NO: 101. A vector is provided wdiich includes a nucleotide sequence encoding CYP52A5B 
protein including the amino acid sequence as set forth in SEQ ID NO: 101. A host cell is 
5 provided which is transfected or transfonned with the nucleic acid encoding CYP52A5B protein 
having the amino acid sequence as set forth in SEQ ID NO: 1 01 . A method of producing a 
CYP52A5B protein including an amino acid sequence as set forth in SEQ ID NO: 101 is 
provided which includes a) transforming a suitable host cell with a DNA sequence that encodes 
the protein having ihe amino acid sequence as set forth in SEQ ID NO: 101; and b) culturing &e 

10 cell under conditions fevoring the expression of the protein. 

An isolated nucleic acid »coding a CyP52i4&4 protein is provided having the 
amino acid sequence set forth in SEQ ID NO: 102. An isolated nucleic acid is provided which 
mcludes a coding region defined by nucleotides 464-2002 as set forth in SEQ ID NO: 92. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 

IS 1 02. A vector is provided which mcludes a nucleotide sequence encoding CYP52A8A protein 
including an amino acid sequmce as set forth in SEQ ID NO: 102. A host cell is provided 
vMch is transfected or transformed with the nucleic acid encoding CYP52A8A protein having an 
ammo acid sequence as set forth in SEQ ID NO: 102. A method of producing a CYP52A8A 
protein including an amino acid sequence as set forth in SEQ ID NO: 102 is provided ^ch 

20 includes a) transforming a suitable host cell with a DNA sequence that encodes the protein 

having the amino acid sequence as set forth in SEQ ID NO: 102; and b) culturing the cell under 
conditions favoring the expression of the protein. 

An isolated nucleic acid encoding a C}TJ2i455 protein is provided having the 
amino acid sequence set forth in SEQ ID NO: 103. An isolated nucleic acid is provided v^ch 

25 includes a coding region defined by nucleotides 1017-2555 as set forth m SEQ ID NO: 93. An 
isolated protein is provided vAnch includes an amino acid sequence as set forth in SEQ ID NO: 
103. A vector is provided which includes a nucleotide sequence encoding CYP52A8B protein 
including an amino acid sequence as set forth in SEQ ID NO: 103. A host cell is provided 
v^ch is transfected or transformed with the nucleic acid encoding CYP52A8B protein having an 

30 amino acid sequence as set forth in SEQ ID NO: 103. A method of producing a CYP52A8B 
protein includmg an amino acid sequence as set £3rth in SEQ ID NO: 103 is provided which 
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includes a) transfonning a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 103; and b) culturing the cell under 
conditions &voring the e}q}ression of the protein. 

An isolated nucleic acid encoding aCYP52D4ApToiem is provided having the 
5 amino acid sequence set forth in SEQ ID NO: 104. An isolated nucleic acid is provided 

including a coding region defined by nucleotides 767-2266 as set forth in SEQ ID NO: 94. An 
isolated protein is provided vMch includes an amino acid sequence as set forth in SEQ ID NO: 
104. A vector is provided which includes a nucleotide sequence encoding CYP52D4A protein 
including an amino add sequence as set forth in SEQ ID NO: 104. A host cell is provided 

10 which is transfected or transformed with the nucleic acid encodmg CYP52D4A protein having an 
amino acid sequence as set forth in SEQ ID NO: 104. A mediod of producmg a CYP52D4A 
protein including an amino acid sequence as set forth in SEQ ID NO: 104 is provided which 
includes a) transfonning a suitable host cell with a DNA sequence that encodes the protein 
having the amino add sequence as set forth in SEQ ID NO: 104; and b) culturing the cell under 

IS conditions &voring the expression of the protein. 

A method for discriminating members of a gene fiimily by quantifying the amount 
of target mRNA in a sample is provided which includes a) providing an organism containing a 
target gene; b) culturing the organism with an organic substrate which causes upregulation in the 
activity of the target gene; c) obtaining a sample of total RNA from the organism at a first point 

20 in time; d) combining at least a portion of the sample of the total RNA with a known amount of 
competitor RNA to form an RNA mixture, wherein the competitor RNA is substantiaUy similar 
to the target mRNA but has a lesser number of nucleotides compared to the target mRNA; e) 
adding reverse transcriptase to the RNA mixture in a quantity sufficient to form corresponding 
target DNA and competitor DNA; (f) conducting a polymerase chain reaction in the presence of 

25 at least one primer specific for at least one substantially non-homologous region of the target 
DNA within the gene family, the primer also specific for the competitor DNA; g) rq)eating steps 
(c-f) using increasing amounts of the competitor KNA while maintaining a substantially constant 
amount of target RNA; h) detennining the point at which the amount of target DNA is 
substantially equal to the amount of competitor DNA; i) quantifying the results by comparing the 

30 ratio of the concentration of unknown target to the known concentration of competitor, and j) 



-11- 



wo 00/20566 



PCT/US99/207^ 



obtaining a sample of total RNA firom the organism at another point in time and repeating steps 
(d-i). 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CPRA genes; b) 
5 increasing, in the host cell, the number of CPRA genes which encode a CPRA protein having the 
amino acid sequence as set forth in SEQ ID NO: 83; c) culturing the host cell in media 
containing an organic substrate which upregulates the CPRA gene, to efifect increased production 
of dicarboxylic acid. 

A method for increasing the production of a CPRA protein having an amino acid 

10 sequence as set forth in SEQ ID NO: 83 is provided which includes a) transforming a host cell 
having a naturally occurring amount of CPRA protein with an increased copy number of a CPRA 
gene that encodes the CPRA protein having the amino acid sequence as set forth in SEQ ID NO: 
83; and b) culturing the cell and thereby increasing expression of the protein compared with that 
of a host cell containing a naturally occurring copy number of the CPRA gene. 

IS A method for increasing production of a dicarboxylic acid is provided \(^ch 

includes a) providing a host cell having a naturally occurring number of CPRB genes; b) 
increasing, in the host cell, the number of CPRB genes which encode a CPRB protein having the 
amino acid sequence as set forth in SEQ ID NO: 84; c) culturing the host ceU in media 
containing an organic substrate which upregulates the CPRB gene, to effect increased production 

20 of dicarboxylic acid 

A method for increasing the production of a CPRB protein having an amino acid 
sequence as set forth in SEQ ID NO: 84 is provided which includes a) transforming a host cell 
having a naturally occurring amount of CPRB protein with an increased copy number of a CPRB 
gene that encodes the CPRB protein having the amino acid sequence as set forth in SEQ ID NO: 

25 84; and b) culturing the cell and thereby increasing expression of the protein compared widi that 
of a host cell containing a naturally occurring copy nimiber of the CPRB gene. 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A1A genes; b) 
increasing, in the host cell, the number of CYP52A1A genes which encode a CYP52A1A protein 

30 having the amino acid sequence as set forth in SEQ ID NO: 95; c) culturing the host cell in media 
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containing an organic substrate which upreguiates the CYP52A1A gene» to efifect increased 
prodiu^don of dicarboxylic acid. 

A method for increasing the production of a CYP52AIA protein having an amino 
acid sequence as set forth in SEQ ID NO: 95 is provided which includes a) transforming a host 
5 cell having a naturally occurring amount of CYP52A lA protein with an increased copy number of 
a CYP52A1A gene that encodes the CYP52A1A protem having the amino acid sequence as set 
forth in SEQ ID NO: 95; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A1A g»e. 

10 A method for increasing production of a dicaiboxylic acid is provided which 

includes a) providing a host cell having a naturally occurring number of CYP52A2A genes; b) 
increasing, in the host cell, the number of CYP52A2A genes which encode a CYPS2A2A protein 
having the amino acid sequence as set forth in SEQ ID NO: 96; c) culturing the host cell in media 
containing an organic substrate which iq)regulales the CYPS2A2A gene, to efTect increased 

IS production of dicarboxylic acid 

A method for increasing the production of a CYP52A2A protein having an amino 
acid sequence as set forth in SEQ ID NO: 96 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A2A protein with an increased copy number of 
a CYP52A2A grae that encodes the CYP52A2A protein having the amino acid sequence as set 

20 forth in SEQ ID NO: 96; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A2A gene, 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A2B genes; b) 
25 increasing, in the host cell, the number of CYP52A2B genes which encode a CYP52A2B protein 
having the amino acid sequence as set forth in SEQ ID NO: 97; c) culturing the host cell in media 
containing an organic substrate which upreguiates the CYP52A2B gene, to effect increased 
production of dicarboxylic acid. 

A method for increasing the production of a CYP52A2B protein having an amino 
30 acid sequence as set forth in SEQ ID NO: 97 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A2B protein with an increased copy number of 
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a CYP52A2B gene that encodes the CYP52A2B protein having the amino acid sequence as set 
forth in SEQ ID NO: 97; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host ceil containing a naturally occurring copy number of the 
CYP52A2B gene. 

5 A method for increasing production of a dicarboxylic acid is provided which 

includes a) providing a host cell having a naturally occurring number of CYP52A3A genes; b) 
increasing, in the host cell, the number of CYP52A3A genes which encode a CYP52A3A protein 
having the amino acid sequence as set forth in SEQ ID NO: 98; c) culturing the host cell in media 
containing an organic substrate which upregulates CYP52A3A gene, to effect increased 

10 production of dicarboxylic acid 

A method for increasing the production of a CYP52A3A protein having an amino 
acid sequence as set forth in SEQ ID NO: 98 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A3A protein with an increased copy number of 
a CYP52A3A gene that encodes the CYP52A3A protein having the amino acid sequence as set 

IS forth in SEQ ID NO: 98; and b) ciilturing the cell and thereby increasing egression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A3A gene. 

A method for increasing production of a dicarboxylic acid is provided which 
mcludes a) providing a host cell having a naturally occurring number of CYP52A3B genes; b) 

20 increasing, in the host cell, the number of CYP52A3B genes which encode a CYP52A3B protein 
having the amino acid sequence as set forth in SEQ ID NO: 99; c) culturing the host cell in media 
containing an organic substrate which upregulates the CYP52A3B gene, to effect increased 
production of dicarboxylic acid. 

A method for increasing the production of a CYP52A3B protein having an amino 

25 acid sequence as set forth in SEQ ID NO: 99 is provided ^ch includes a) transforming a host 
cell having a naturally occurring amoimt of CYP52A3B protein with an increased copy number of 
a CYP52A3B gene that encodes the CYPS2A3B protein having the amino acid sequence as set 
forth in SEQ ID NO: 99; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 

30 gene. 
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A method for increasing production of a dicarboxylic acid is provided which- 
includes a) providing a host cell having a naturally occurring number of CYP52A5A genes; b) 
increasing, in the host cell, the number of CYP52A5 A genes which mcode a CYP52A5A protein 
having Ae amino acid sequence as set forth in SEQ ID NO: 100; c) culturing the host cell in 
5 media containing an organic substrate which upregulates the CYP52A5A gene, to effect increased 
production of dicarboxylic acid. ^ 

A method for increasing the production of a CYP52A5A protein having an amino 
acid sequence as set forth in SEQ ID NO: 100 is provided which includes a) transfonning a host 
cell having a naturally occurring amount of CYP52A5A protein with an increased copy number of 
10 a CYP52A5A gene that encodes the CYPS2A5A protein having the amino acid sequence as set 
forth in SEQ ID NO: 100; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occuiring copy number of the 
CYP52ASA gene, 

A method for increasing production of a dicarboxylic acid is provided which 
15 includes a) providing a host cell having a naturally occurring numb^ of CYPS2A5B genes; b) 
increasing, in the host cell, the number of CYP52A5B genes i^ch encode a CYP52A5B protein 
having the amino acid sequence as set forth in SEQ ID NO: 1 0 1 ; c) culturing the host cell in 
media contaming an organic substrate which upregulates the CYP52A5B gene, to effect increased 
production of dicarboxylic acid 
20 A method for increasing the production of a CYPS2A5B protein having an amino 

acid sequence as set forth in SEQ ID NO: 101 is provided ^lich includes a) transfonning a host 
cell having a naturally occurring amount of CYP52A5B protein with an increased copy number of 
a CYP52A5B gene that encodes the CYP52A5B protein having the amino acid sequence as set 
forth in SEQ ID NO: 101; and b) culturing the cell and thereby increasing expression of the 
25 protein compared with that of a host cell containing a naturally occurring copy number of the 
CYPS2ASBgfsait. 

A method for increasing production of a dicaiboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A8A genes; b) 
increasing, in the host cell, the nimiber of CYP52A8A genes which encode a CYP52A8A protein 
30 having &e amino acid sequence as set forth in SEQ ID NO: 102; c) culturing the host cell in 
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media containing an organic substrate which npreguiates the CYP52A8A gene, to effect increased 
production of dicarboxyiic acid 

A method for increasing the production of a CYP52A8A protein having an amino 
acid sequence as set forth in SEQ ID NO: 102 is provided which includes a) transforming a host 
5 cell having a naturally occurring amount of CYP52A8A protein with an increased copy number of 
a CYP52A8A gene that encodes the CYP52A8A protein having the amino acid sequence as set 
forth in SEQ ID NO: 102; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A8A gene. 

10 A method for increasing production of a dicarboxyiic acid is provided which 

includes a) providing a host cell having a naturally occurring number of CYP52A8B genes; b) 
increasmg, in the host cell, the number of CYP52A8B genes which encode a CYP52A8B protein 
having the amino acid sequence as set forth in SEQ ID NO: 103; c) culturing the host cell in 
media containing an organic substrate which upregulates the CYPS2A8B gene, to effect increased 

IS production of dicarboxyiic acid. 

A method for increasing the production of a CYP52A8B protein having an amino 
acid sequence as set forth in SEQ ID NO: 103 is provided vMoh includes a) transforming a host 
cell having a naturally occurring amount of CYP52A8B protein with an increased copy number of 
a CYP52A8B gene that encodes the CYP52A8B protein having the amino acid sequence as set 

20 forth in SEQ ID NO: 103; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A8BgOM. 

A method for increasing production of a dicarboxyiic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52D4A genes; b) 
25 increasing, in the host cell, the number of CYP52D4A genes which encode a CYP52D4A protein 
having the amino acid sequence as set forth in SEQ ID NO: 104; c) culturing the host cell in 
media containing an organic substrate which upregulates the CYP52D4A gene, to effect increased 
production of dicarboxyiic acid. 

A method for increasing the production of a CYP52D4A protein having an amino 
30 acid sequence as set forth in SEQ ID NO: 104 is provided yfMzh includes a) transforming a host 
cell having a naturally occurring amount of CYP52D4A protem with an increased copy number 
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of a CYP52D4A gene that encodes the CYP52D4A protein having the amino acid sequence as set 
forth in SEQ ID NO: 1 04; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host ceil containing a naturally occuiring copy number of the 
CYP52D4A gene. 

5 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 is a schematic representation of cloning vector pTriplEx from 
Clontech^ Laboratories, Inc. Selected restriction sites within the multiple cloning site are 
shown. 

10 Figure 2A is a map of the ZAP Expr^*^ vector. 

Figure 2B is a schematic representation of cloning phagemid vector pBK-CMV. 
Figure 3 is a double stranded DNA sequence of a portion of the S prime coding 
region oftheClT52i45ii gene (SEQ ID NO: 36). . 

Figure 4 is a diagrammatic representation of highly conserved regions of CYP and 
15 CPR gene protein sequences. Helix I represents the putative substrate binding site and HR2 
represents the heme binding region. The FMN, FAD and NADPH binding regions are indicated 
below the CPR gene. 

Figure 5 is a diagrammatic representation of the plasmid pHKMl containing the 
truncated CPRA gene present in the pTriplEx vector. A detailed restriction map of only the 
20 sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 6 is a diagrammatic representation of the plasmid pHKM4 containing the 
truncated CPRA gene present in the pTriplEx vector. A detailed restriction map of only the 
sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
25 transoiption is mdicated by an arrow under the open reading fi:ame. 

Figure 7 is a diagrammatic representation of the plasmid pHKM9 containing the 
CPRB gene (SEQ ID NO: 82) present in the pBK-CMV vector. A detailed restriction m^ of 
only the sequenced region is shown at the top. The bar indicates the open reading frame. The 
direction of transcription is indicated by an arrow under the open reading frame. 
30 Figure 8 is a diagrammatic representation of tiie plasmid pHKMl 1 containing the 

CYP52A1A gene (SEQ ID NO: 85) present in the pBK-CMV vector. A detailed restriction map 
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of only the sequenced region is shown at the top. The bar indicates the open reading firame. The 
direction of transcription is indicated by an arrow under the open reading fiame. 

Figure 9 is a diagrammatic representation of the plasmid pHKM12 containing the 
CYP52A8A gene (SEQ ID NO: 92) present in the pBK-CMV vector, A detailed restriction m^ 
5 of only the sequenced region is shown at the top. The bar indicates the open reading fiame. The 
direction of transcription is indicated by an arrow under the open reading fiame. 

Figure 10 is a diagranunadc representation of the plasmid pHKM13 containing 
the CYP52D4A gme (SEQ ID NO: 94) present m the pBK-CMV vector. A detailed restriction 
map of only the sequenced region is shown at the top. The bar indicates the open reading firame. 
10 The direction of transcription is indicated by an arrow under the open reading fiame. 

Figure 1 1 is a diagranunatic representation of die plasmid pHKM14 containing 
the CYPS2A2B gene (SEQ ID NO: 87) present in the pBK-CMV vector. A detailed restriction 
map of only the sequenced region is shown at the top. The bar indicates the open reading firame. 
The direction of transcription is indicated by an arrow under the open reading fiame. 
IS Figure 12 is a diagrammatic representation of the plasmid pHKMlS containing 

the CYP52A8B gene (SEQ ID NO: 93) present in the pBK-CMV vector. A detailed restriction 
map of only the sequenced region is shown at the top. The bar indicates the open reading frame. 
The direction of transcription is indicated by an arrow under the open reading fimne. 

Figures 13A-13D show the complete DNA sequences including regulatory and 
20 coding regions for the CPRA gene (SEQ ID NO: 81) and CPRB gene (SEQ ID NO: 82) fiom C 
tropicalis ATCC 20336. Figures 13A-13D show regulatory and coding region alignment.of 
these sequences. Asterisks indicate conserved nucleotides. Bold indicates protein coding 
nucleotides; the start and stop codons are underlined 

Figure 14 shows the anuno acid sequence of die CPRA (SEQ ID NO: 83) and 
25 CPRB (SEQ ID NO: 84) proteins fiom C. tropicalis ATCC 20336 and alignment of these amino 
acid sequences. Asterisks indicate residues which are not conserved. 

Figures ISA-lSM show the complete DNA sequences including regulatory and 
coding regions for the following genes fiom C. tropicalis ATCC 20366: CYP52A1A (SEQ ID 
NO:- 85), CKPJ2^2^ (SEQ ID NO: 86), CYPS2A2B (SEQ ID NO: 87), CYP52A3A (SEQ ID NO: 
30 88), CYPS2A3B (SEQ ID NO: 89), CYP52A5A (SEQ ID NO. 90), CYP52A5B (SEQ ID NO: 91), 
CYP52A8A (SEQ ID NO: 92), CYP52A8B (SEQ ID NO: 93), and CYP52D4A (SEQ ID NO: 94). 
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Figures ISA- 15M show regulatory and coding region alignment of these sequences. Asterisks 
indicate conserved nucleotides. Bold indicates protein coding nucleotides; the start and stop 
codons are underlined. 

Figures 16A-1 6C show the amino acid sequences encoding the CYP52AJA (SEQ 
5 ID NO: 95), CYPS2A2A (SEQ ID NO: 96), CYP52A2B (SEQ ID NO: 97), CYP52A3A (SEQ ID 
NO: 98), CYP52A3B (SEQ ID NO: 99), CYP52A5A (SEQ ID NO: 100), CYP52A5B (SEQ ID 
NO: 101), CYP52A8A (SEQ ID NO: 102), CYP52A8B (SEQ ID NO: 103) and CYP52D4A (SEQ 
ID NO. 104) proteins from C. tropicalis ATCC 20336. Asterisks indicate identical residues and 
dots indicate conserved residues. 
10 Figure 17 is a diagrammatic representation of the pTAg PGR product cloning 

vector (commercially available fiom R&D Systems, Mirmeapolis, MN). 

Figure 1 8 is a plot of the log ratio (U/C) of unknown target DNA product to 
competitor DNA product versus the concentration of competitor mRNA. The plot is used to 
calculate the target messenger RNA concentration in a quantitative competitive reverse 
IS transcription polymerase chain reaction (QC-RT-PCR). 

Figure 19 is a graph showing the relative induction of C tropicalis ATCC 20962 
CYP52ASA (SEQ ID NO: 90) by the addition of the fatty acid substrate Emersol® 267 to tiie 
growth medium. 

Figure 20 is a graph showing the induction of C tropicalis ATCC 20962 CYP52 
20 and CPR genes by Emersol® 267. P450 genes CYP52A3A (SEQ ID NO: 88), CYP52A3B (SEQ 
ID NO: 89), and CYP52D4A (SEQ ID NO: 94) are expressed at levels below the detection level 
of the QC-RT-PCR assay. 

Figure 21 is a scheme to integrate selected genes into the genome of Candida 
tropicalis strains and recovery of URA3A selectable marker. 
25 Figure 22 is a schematic representation of the transformation of C. tropicalis 

H5343 ura3' with CYP and/or CPR genes. Only one URA3 locus needs to be functional. There 
are a total of 6 possible uraS targets (Stira3 A loci-2 pox4 disruptions, 2 pox 5 disn4)tions, 1 
tiraSA locus; and 1 iira3B locus). 

Figure 23 is the complete DNA sequence (SEQ ID NO: lOS) encoding URA3A 
30 fiom C. tropicalis ATCC 20336 and the amino acid sequence of the encoded protem (SEQ ID 
NO: 106). 

-19- 



wo 00/20566 



PCT/US99y20797 



Figure 24 is a schematic representation of the plasmid pURAin, the base vector 
for integrating selected genes into the genome of C. tropicalis. The detailed construction of 
pURAin is described in the text 

Figure 25 is a schematic representation of the plasmid pNEB193 cloning vector 
5 (commercially available from Nev/ England Biolabs, Beverly, MA). 

Figure 26 is a diagrammatic representation of the plasmid pPAlS containing the 
truncated CYP52A2A gene present in the pTriplEx vector. A detailed restriction map of only the 
sequenced region is shown at the top. The bar indicates the open reading fi:ame. The direction of . 
transcription is indicated by an arrow under the open reading frame. 
10 Figure 27 is a schematic representation of pURA2in, the base vector is 

constructed in pNEB193 which contains the 8 bp recognition sequences fox Asc /, Pad mi Pme 
L URA3A (SEQ ID NO: 105) and CYP52A2A (SEQ ID NO: 86) do not contain these 8 bp 
recognition sites. URA3A is inverted so that the transforming fragment will attempt to 
recircularize prior to integration. An Asc I/Pme I fragment was used to transform H5343 ura~. 
IS Figure 28 shows a scheme to detect integration of CYP52A2A gene (SEQ ID NO: 

86) into the genome of HS343 ura\ In all cases, hybridization band intensity could reflect the 
number of integrations. 

Figure 29 is a diagrammatic representation of the plasmid pPA57 containing Ae 
truncated CYP52A3A gene present in the pTriplEx vector. A detailed restriction map of only the 
20 sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 30 is a diagrammatic representation of the plasmid pPA62 containing the 
truncated CYP52A3B gene present in the pTriplEx vector. A detailed restriction map of only the 
sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
25 transcription is indicated by an arrow under the open reading frame. 

Figure 3 1 is a diagrammatic representation of the plasmid pPAL3 containing the 
truncated CYP52ASA gene present in the pTriplEx vector. A detailed restriction m^ of only the 
sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 
30 Figure 32 is a diagrammatic representation of the plasmid pPA5 contaming the 

truncated CYPS2A5A gene present in the pTriplEx vector. A detailed restriction map of only the 
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sequenced region is shown at the top. The bar indicates the open reading fiame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 33 is a diagrammatic representation of the plasmid pPA18 containing the 
truncated CYP52D4A gene present in the pTriplEx vector. A detailed restriction map of only the 
S sequenced region is shown at the top. The bar indicates the open reading fi:ame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 34 is a graph showing the expression of CYP52A1 (SEQ ID NO: 85), 
CYP52A2 (SEQ ID NO: 86) and CYP52A5 genes (SEQ ID NOS: 90 and 91) from C tropicalis 
20962 in a fennentor run upon the addition of amounts of the substrate oleic acid or tridecane in 
10 a spiking experiment 

Figure 35 depicts a scheme used for the extraction and analysis of diacids and 
monoacids from fermentation broths. 

Figure 36 is a graph showing the induction of npression of CYP52AlAj 
CYP52A2A and CYP52A5A in a fermentor run upon addition of the substrate octadecane. No 
15 induction of CYP52A3A or CYP52A3B was observed under these conditions. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Diacid productivity is unproved according to the present invention by selectively 
increasing enzymes which are known to be important to the oxidation of organic substrates such 

20 as fatty acids composing the desired feed According to the present invention, ten CYP genes 
and two CPR genes of C. tropicalis have been identified and characterized that relate to 
participation in the cj-hydroxylase complex cataly^g the first step in the Ci>-oxidation pathway. 
In addition, a novel quantitative competitive reverse transcription polymerase chain reaction 
(QC-RT-PCR) assay is used to measure gene expression in the fermentor under conditions of 

25 induction by one or more organic substrates as defined herein. Based upon QC-RT-PCR results, 
three CYP genes, CYP52A1, CYP52A2 and CYP52A5, have been identified as bemg of greater 
importance for the a>-oxidation of long chain fatty acids. Amplification of the CPR gene copy 
number improves productivity. The QC-RT-PCR assay indicates that both CYP and CPR genes 
appear to be under tight regulatory control. 

30 In accordance with the present invention, a method for discriminating members of 

a gene family by quantifying the amount of target mRNA in a sample is provided which 
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includes a) providing an organism containing a target gene; b) culturing the organism with an 
organic substrate which causes upreguiation in the activity of the target gene; c) obtaining a 
sample of total RNA from the organism at a first point in time; d) combining at least a portion of 
the sample of the total RNA with a known amount of competitor RNA to form an RNA mixture, 
S wherein the competitor RNA is substantially similar to the target mRN A but has a lesser number 
of nucleotides compared to the target mRN A; e) adding reverse transcriptase to the RNA mixture 
in a quantity sufBcient to fomi corresponding target DNA and competitor DNA; (f) conducting a 
polymerase chain reaction in the presence of at least one primer specific for at least one 
substantially non-homologous region of the target DNA within the gene &mily, the primer also 

10 specific for the competitor DNA; g) repeating steps (c-f) using increasing amounts of the 
competitor RNA v^le maintaining a substantially constant amount of target RNA; h) 
determining the point at which the amount of target DNA is substantially equal to the amount of 
competitor DNA; i) quantifying the results by comparing the rado of the concentration of 
unknown target to the known concentration of competiton and j) obtaining a sample of total 

IS RNA &om the organism at another point in time and repeating steps (d-i). 

In addition, modification of existing promoters and/or the isolation of alternative 
promoters provides increased expression of CYP and CPR genes. Strong promoters are obtained 
from at least four sources: random or specific modifications of the CYP52A2 promoter, 
CYP52A5 promoter, CYP52A1 promoter, the selection of a strong promoter from available 

20 Candida )9oxidation genes such as POX4 and POX5, or screening to select another suitable 
Candida promoter. 

Promoter strength can be directiy measured using QT-RT-PCR to measure CYP 
and CPR gene expression in Candida cells isolated fix>m fennentors. Enzymatic assays and 
antibodies specific for CYP and CPR proteins are used to verify that increased promoter strengtii 

25 is reflected by increased synthesis of the corresponding enzymes. Once a suitable promoter is 
identified, it is fiised to the selected CYP and CPR genes and introduced into Candida for 
construction of a new improved production strain. It is contemplated that the coding region of 
the CYP and CPR genes can be fiised to suitable promoters or otiier regulatory sequences v/bich 
are well known to those skilled in the art 

30 In accordance with the present invention, studies on C. tropicalis ATCC 20336 

have identified six unique CYP genes and four potential alleles. QC-RT-PCR analyses of cells 
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isolated during the course of the fermentation bioconversions indicate that at least three of the 
CYP genes are induced by fatty acids and at least two of the CYP genes are induced by alkanes. 
See Figure 34. Two of the CYP genes are highly induced indicating participation in the (o- 
hydroxylase complex which catalyzes the rate limiting step in the oxidation of &tty acids to the 
5 corresponding diacids. 

The biochemical characterizations of each P4S0 enzyme herein is used to tailor 
the C tropicalis host for optimal diacid productivity and is used to select P4S0 enzymes to be 
amplified based upon the &tty acid content of the feedstream. CYP gene(s) encoding P4S0 
enzymes that have a low specific activity for the &tty acid or alkane substrate of choice are 

10 targeted for inacdvation, thereby reducing the physiological load on the cell. 

Since it has been demonstrated that CPR can be limiting in yeast systems, the 
removal of non-essential P4S0s from the system can free electrons that are being used by non- 
essential P4S0s and make them available to the P450s impor^t for diacid productivity. 
Moreover, the removal of non-essential P4S0s can make available other necessary but potentially 

IS limiting components of the P4S0 system (i.e., available membrane space, heme and/or NADPH). 

Diacid productivity is thus improved by selective integration, amplification, and 
over expression of CYP and CPR genes in the C. tropicalis production host 

It should be understood that host cells into which one or more copies of desired 
CYP and/or CPR genes have been introduced can be made to include such genes by any 

20 technique known to those skilled in the art For example, suitable host cells include procaryotes 
such as Bacillus sp,^ Pseudomous sp.^ Actinomycetes sp.^ Eschericia sp., Kfycobacterium sp.y and 
eukaryotes such as yeast, algae, insect cells, plant cells and and filamentous fungi. Suitable host 
cells are preferably yeast cells such as Yarrowia^ BebaromyceSy SaccharomyceSy 
Schizosaccharomyces, and Pichia and more preferably those of the Candida genus. Preferred 

2S species of Candid are tropicalis y maltosa, apicolay paratropicalis, albicans, cloacae, 

guillermondih intermedia, lipolytica, parepsilosis and zeylenoides. Certain preferred stains of 
Candida tropicalis are listed in U.S. Patent No. 5,254,466, incorporated herein by reference. 

Vectors such as plasmids, phagemids, phages or cosmids can be used to transform 
or transfect suitable host cells. Host cells may also be transformed by introducing into a cell a 

30 linear DNA vector(s) containing the desired gene sequence. Such linear DNA may be 

advantageous when it is desirable to avoid introduction of non-native (foreign) DNA into the 
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celL For example, DNA consisting of a desired target gene(s) flanked by DNA sequences \^ch 
are native to the cell can be introduced into the cell by electroporation, lithium acetate 
transformation, spheroplastmg and the like. Flanking DNA seqiiences can include selectable 
markers and/or other tools for genetic engineering. 
5 A suitable organic substrate herein can be any organic compound that is 

biooxidizable to a mono* or polycarboxylic acid. Such a compound can be any saturated or 
unsaturated aliphatic compound or any caibocyclic or heterocyclic aromatic compound having at 
least one terminal methyl group, a terminal carboxyl group and/or a terminal functional groiqi 
\^ch is oxidizable to a carboxyl group by biooxidation. A terminal functional group v/bidi is a 

10 derivative of a carboxyl group may be present in the substrate molecule and may be converted to 
a carboxyl group by a reaction other than biooxidation. For example, if the terminal group is an 
ester that neither the wild-type C tropicalis nor the genetic modifications described herein will 
allow hydrolysis of the ester functionality to a carboxyl group, then a lipase can be added during 
the fermentation step to liberate fi^ fatty acids* Suitable organic substrates include, but are not 

IS limited to, saturated fatty acids, unsaturated fetty acids, alkanes, alkenes, alkynes and 
combinations thereof. 

Alkanes are a type of saturated organic substrate which are usefid herein. The 
alkanes can be linear or cyclic, branched or straight chain, substituted or unsubstituted. 
Particularly preferred alkanes are those having from about 4 to about 25 carbon atoms, examples 

20 of which include but are not limited to butane, hexane, octane, nonane, dodecane, tridecane, 
tetradecane, octadecane and the like. 

Examples of unsaturated organic substrates which can be used herein include but 
are not limited to internal olefins such as 2-pentene, 2-hexene, 3-hexene, 9-octadecene and the 
like; unsaturated carboxylic acids such as 2-hexenoic acid and esters thereof, oleic acid and esters 

25 thereof including triglyceryl esters having a relatively high oleic acid content, erucic acid and 
esters thereof including triglyceryl esters having a relatively high erucic acid content, ricinoldc 
acid and esters thereof including triglyceryl esters having a relatively high ricinoleic acid content, 
linoleic acid and esters thereof including triglyceryl esters having a relatively high linoleic acid 
content; unsaturated alcohols such as 3-hexen-l-ol, 9-octadecen-l-ol and the like; unsaturated 

30 aldehydes such as 3-hexen-l -al, 9-octadecen-l -al and the like. In addition to the above, an 
organic substrate which can be used herein include alicyclic compounds having at least one 
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internal carbon-caibon double bond and at least one terminal methyl group, a terminal carboxyl 
groiQ) and/or a terminal functional group which is oxidizable to a carboxyl groiq> by 
biooxidation. Examples of such compounds include but are not limited to 3,6-dimethyl, 1 ,4- 
cyclohexadiene; 3-methylcyclohexene; 3-methyM, 4-cyclohexadiene and the like. 
5 Examples of the aromatic compoimds that can be used herein include but are not 

lunited to arenes such as o-, m-, p-xylene; o-, m-, p-methyl benzoic acid; dimethyl pyridine, and 
the like. The organic substrate can also contain oth^ functional groups that are biooxidizable to 
carboxyl groups such as an aldehyde or alcohol group. The organic substrate can also contain 
oth^ functional groups that are not biooxidizable to carboxyl groups and do not interfere with 

10 the biooxidation such as halogens, ethers, and the like. 

Examples of saturated fatty acids which may be applied to cells incorporating the» 
present CYP and CPR genes include caproic, enanthic, caprylic, pelargonic, capric, undecylic, 
lauric, myristic, pentadecanoic, palmitic, margaric, stearic, arachidic, behenic acids and 
combinations thereof. Examples of unsaturated £itty acids which may be implied to cells 

IS mcorporating the present CYP and CPR genes include pahnitoleic, oleic, erucic, linoleic, 
linolmic acids and combinations thereof. Alkanes and firactions of alkanes may be applied 
i;^ch include chain links from C12 to C24 in any combination. An example of a preferred fiitty 
acid mixtures are Emersol® 267 and Tallow, both commercially available from Henkel 
Chemicals Group, Cincinnati, OH. The typical fetty acid composition of Emersol® 267 and 

20 Tallow is as follows: 



25 



30 





TALLOW 


E267 


C14:0 


3.5% 


2.4% 


C14:l 


1.0% 


0.7% 


C15:0 


0.5% 




C16:0 


25J% 


4.6% 


C16:l 


4.0% 


5.7% 


C17:0 


2,5% 




C17:1 




5.7% 


C18:0 


19.5% 


1.0% 


C18:l 


41.0% 


69.9% 


C18:2 


2.5% 


8.8% 
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CI 8:3 03% 

C20:0 0.5% 

C20:l 0.9% 

S The foUowing examples are meant to illustrate biit not to limit the inventi^ All 

relevant microbial strains and plasmids are described in Table 1 and Table 2, respectively. 
Table 1. List of Escherichia coli and Candida tropicalis strains 



KCoa 

STRAIN 


GENOTYPE 


SOURCE 


XLlBlue- 
MRF' 


endAl. gyrA96, hsdR17, lac', recAl, 
relAJ, supE44, [F' lacPZMlS, 
proAB, TnIO] 


Stratagene, La Jolla, CA 


BM25.8 


SupE44, thi OaC'ProAB) [F' traD36. 

proAB\ lacPZ MI5] 

Ximm434 (kaif)Pl (can/^) hsdR (r^r 


Clontech, Palo Alto, CA 


XLOLR 


(mcrA)I83 (mcrCB-hs€SMR'mrr)173 
endAl thi-1 recAI gyrA96 relAJ lac 
\F*proAB lacI^ZMlS TnIO (Tctf) Sw 
(nansuppressing X'^(lambda resistant) 


Stratagene, La Jolla» CA 



15 



C tropicalis 
STRAIN 


GENOTYPE 


SOURCE 


ATCC20336 


Wild-type 


American Type Culture 
Collection, Rockville, MD 


ATCC750 


Wild-type 


American Type Culture 
Collection, Rodcville, MD 


ATCC 20962 


ura3A/ura3B, 

pax4A::wra3A/pox4B::ura3A, 
pcx5::ura3A/pox5::URA3A 


Henkei 


H5343 lira- 


ura3A/ura3B, ^ 

pax4A:: tira3A/pox4B::ura3A, 
pax5::ura3A/pox5::URA3A^ ura3* 


Henkel 


HDCl 


ura3A/ura3B, 

pox4A :: ura3A/pax4B::tfra3A, 
pax5::wra3A/pax5:: URA3A, 
ura3::URA3A^YPS2A2A 


Henkei 


HDC5 


ura3A/wra3B. 

pox4A::tira3A/pcx4B::ura3A, 
pax5::ura3A/pox5:: URA3A^ 
ura3::URA3A'CYP52A3A 


Henkel 


HDCIO 


ura3A/ura3B, 

pox4A::ttra3A/pox4B::ura3A, 

paxS::vra3A/pax5::URA3A, 

ura3::URA3A-CPRB 


Henkei 
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HDC15 


vraSA/uroSB, 

pax4A::ura3A/pcx4B::ura3A, 
pox5::ura3A/pax5:: URA3A, 
uruS-URASA-CYPSlASA 


Henkel 


HDC20 


tira3A/uFa3B, 

pax4A ::ura3A/pax4B::ura3A. 
pox5::urQ3A/pax5:: URA3A, 
ura3::URA3A-CYP52A2A + CPR B 
{CYP and CPR have opposite S* to 3* 
orientation witii respect to each 
other) 


Henkel 


HDC23 


uraSA/vraSB, 

pcx4A::ura3A/pax4B::ura3A, 
pax5::ura3A/pax5:: URASA^ 
ura3::URA3A-CYP52A2A + CPRB 
(CYP and CPR have same 5' to 3' 
orientation with respect to each 
other) 


Hoikel 



5 

Table 2. List of piasmids isolated from genomic libraries and constructed for use 
in gene integrations. 



Plasmid 


Base 
vector 


Insert 


Insert 
Size 


Plasmid 
size 


Description 


pURAin 


pN£B193 


URA3A 


1706 bp 


4399 bp 


pN£B193 with the URA3A gene 
inserted in the Ascl - Pmel site» 
generating a Pacl site 


pURA2in 


pURAin 


CYP52A2A 


2230 bp 


6629 bp 


pURAin contaming a PGR 
CYP52A2A allele contauung 
Pacl restriction sites 


pURA 


pURAin 


CPRB 


3266 bp 


7665 bp 


pURAin containing a PGR 
CPRB allele contaming Pacl 
restriction sites 


pHKMl 


pTriplEx 


Truncated 
CPRA gene 


Approx. 
3.8 kb 


Approx. 
7.4 kb 


A truncated CPRA gene 
obtained by first screening 
libraiy containing the 5' 
untranslated region and 1^ kb 
open reading frame 


pHKM4 


PTriplEx 


Truncated 
CPRA gene 


Approx. 
5kb 


Approx. 
8.6 kb 


A truncated CPRA geat 
obtained by screening second 
library contaming the 3* 
untranslated region end 
sequence 


pHKM9 


pBC- 
CMV 


CPRB 
gene 


Approx. 
5.3 kb 


Approx. 
9.8 kb 


CPRB allele isolated from the 
thbd library 


pHKMll 


pBC- 
CMV 


CYP52A1A 


^prox. 
5kb 


Approx. 
9Jkb 


CYP52AIA isolated from the 
third library 


pHKM12 


pBC- 
CMV 


CYP52A8A 


Approx. 
7.5 kb 


Approx. 
12 kb 


CYP52A8A isolated &om the 
third library 


pHKM13 


pBC- 
CMV 


CYP52D4A 


Approx. 
7.3 kb 


Approx. 
11.8kb 


CYP52D4A isolated fix>m the 
third library 
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pHKM14 


pBC- 
CMV 


CYP5U2B 


Approx. 
6 kb 


Approx. 
10.5 kb 


CYP52A2B isolated from the 

^1 ; 1 n III 1 

third library 


pHKMlS 


pBO 
CMV 


CYP52A8B 


Approx. 
6.6 kb 


Approx. 
11.1 kb 


CYP52A8B isolated from the 
third library 


pPAL3 


pTripiEx 


CYP52A5A 


4.4 kb 


Approx. 
8.1 kb 


CYPS2A5A isolated from the 1st 
library 


pPA5 


pTriplEx 


CYP52ASB 


4.1 kb 


Approx. 


CYP52A5B isolated from the 
zna iiDiajy 


pPAlS 


pTriplEx 


CYP52A2A 


6.0 kb 


Approx. 
9.7 kb 


CYPS2A2A isolated from the 
2nd library 


pPA57 


pTriplEx 


CYP52A3A 


5Jkb 


Approx. 
92)sb 


CYP52A3A isolated from die 
2nd library 


pPA62 


pTriplEx 


CYP52A3B 


6.0 kb 


Approx. 
9.7 kb 


CYP52A3B isolated from the 
2iid library 



EXAMPLE 1 

10 Poriiication of Genomic DNA from Canada tropicalis ATCC 20336 

A. Construction of Genomic Libraries 
SO ml of YEPD broth (see Chart) was inoculated with a single colony of C. 
tropicalis 20336 from YEPD agar plate and grown overnight at 30**C. 5 ml of the overnight 
culture was inoculated into 100 ml of fresh YEPD broth and incubated at 30**C for 4 to 5 hr with 

15 shaking. Cells were harvested by centrifiigation, washed twice with sterile distilled water and 
resuspended in 4 ml of spheroplasting buffer (1 M Sorbitol, 50 mM EDTA, 14 mM 
mercaptoethanol) and incubated for 30 min at 37^C with gentle shaking. 0.5 ml of 2 mg/ml 
zymolyase (ICN Pharmaceuticals, Inc., Irvine, CA) was added and incubated at 37''C with gentle 
shaking for 30 to 60 min. Spheroplast formation was monitored by SDS lysis. Spheroplasts 

20 were harvested by brief centrifiigation (4,000 rpm, 3 min) and were washed once with the 

^heroplast buffer without men:q)toethanol. Harvested spheroplasts were then suspended in 4 
ml of lysis buffer (0.2 M Tris/pH 8.0, 50 mM EDTA, 1% SDS) containing 100 ^g/ml RNase 
(Qiagen Inc., Chatsworth, CA) and incubated at 37''C for 30 to 60 muL 

Proteins were denatured and extracted twice with an equal volume of 

25 chloroform/isoamyl alcohol (24: 1) by gently mixing the two phases by hand inversions. The two 
phases were separated by centrifiigation at 1 0,000 rpm for 1 0 min and the aqueous phase 
containing the high*molecular weight DNA was recovered. To the aqueous layer NaCl was 
added to a final concentration of 0.2 M and the DNA was precipitated by adding 2 vol of ethanol. 
Precipitated DNA was spooled with a clean glass rod and resuspended in TE buffer (10 mM 
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Tris/pH 8.0, 1 mM EDTA) and allowed to dissolve overnight at 4''C. To the dissolved DNA, 
RNase fiee of any DNase activity (Qiagen Inc., Oiatsworth, CA) was added to a final 
concentration of 50 ^g/ml and incubated at 37^C for 30 min. Then protease (Qiagen Inc., 
Chatsworth, CA) was added to a final concentration of 100 ^g^ and incubated at SS to 60^*0 
5 for 30 min. The solution was extracted once with an equal volume of phenol/chloroform/isoamyl 
alcohol (25:24: 1) and once with equal volume of chloroformAsoamyl alcohol (24: 1). To the 
aqueous phase 0. 1 vol of 3 M sodium acetate and 2 volumes of ice cold ethanol (200 proof) were 
added and the high molecular weight DNA was spooled with a glass rod and dissolved in 1 to 2 
ml of TE buff^. 

10 

B. Genomic DNA Preparation for PGR 
Amplification of CFP and CPR Genes 

Five 5 ml of YPD medium was inoculated with a single colony and grown at 

30*^0 overnight The culture was centrifiiged for 5 min at 1200 x g. The supernatant was 

15 ronoved by aspiration and 0.5 ml of a sorbitol solution (0.9 M sorbitol, 0.1 M Tris-Cl pH 8.0, 
0.1 M EDTA) was added to the pellet The pellet was resuspended by vortexing and 1 ^lof2- 
mercaptoethanol and 50 ^1 of a 10 ^g/ml zymolyase solution were added to the mixture. The 
tube was incubated at 37**C for 1 hr on a rotary shaker (200 rpm). The tube was then 
centrifiiged for 5 min at 1 200 x g and the supernatant was removed by aspiration. The protoplast 

20 pellet was resuspended in 0.5 ml Ix TE (10 mM Tris-Cl pH 8.0, 1 mM EDTA) and transferred 
to a 1.5 ml microcentrifuge tube. The protoplasts were lysed by the addition of 50 ^il 10% SDS 
followed by incubation at 65 °C for 20 min. Next, 200 ^1 of 5M potassium acetate was added 
and after mixing, the tube was incubated on ice for at least 30 min. Cellular debris was removed 
bycentrifiigationat 13,000xgfor5miiL The supernatant was carefiilly removed and 

25 transferred to a new mioDfiige tube. The DNA was precipitated by the addition of 1 ml 100% 
(200 proof) ethanol followed by centrifiigation for 5 mm at 13,000 x g. The DNA pellet was 
washed with 1 ml 70 % ethanol followed by centrifiigation for 5 min at 13,000 x g. After 
partially drying the DNA under a vacuum, it was resuspended in 200 ^l of Ix TE. TheDNA 
concentration was determined by ratio of the absorbance at 260 run / 280 rmi (Ajmoso)* 

30 
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EXAMPLE 2 

Constniction of Candida tropicalis 20336 Genomic Libraries 

Three genomic libraries of C tropicalis were constructed, two at Clontech 
Laboratories, Inc., (Palo Alto, CA) and one at Henkel Corporation (Cincinnati, OH). 

5 

A. Clontech Libraries 

The first Clontech library was made as follows: Genomic DNA was prepared 
fiom C tropicalis 20336 as described above, partially digested with £coRI and size fractionated 
by gel electrophoresis to eliminate fragments smaller than 0.6 kb. Following size fractionation, 

10 several ligations of the £coRI genomic DNA fragments and lambda (X) TriplEx™ vector (Figure 
1) arms with £coRI sticlgr ends were packaged into A. phage heads under conditions designed 
to obtain one million independent clones. The second genomic library was constructed as 
follows: Genomic DNA was digested partially with Sau3Al and size fractionated by gel 
electrophoresis. The DNA fragments were blunt ended using standard protocols as described, 

15 e.g., in Sambrook et al. Molecular Cloning: A Ldboraiory Manuah 2ed. Cold Spring Harbor 
Press, USA (1 989), incorporated herein by reference. The strategy was to fill in flie SauSAl 
overiiangs with Klenow polymerase (Life Technologies, Grand Island, NY) followed by 
digestion with SI nuclease (Life Technologies, Grand Island, NY). After SI nuclease digestion 
the fi:agments were end filled one more time with Klenow polymerase to obtain the final blunt- 

20 ended DNA firagments. £coRI linkers were ligated to these blunt-ended DNA fiagments 

followed by ligation into the XTriplEx vector! The resultant library contained approximately 2 X 
1 0^ independent clones with an average insert size of 4.5 kb. 

B. Henkel Library 

25 The third genomic library was constructed at Henkel Corporation using XZAP 

Express™ vector (Stratagene, La JoUa, CA) (Figure 2). Genomic DNA was partially digested 
vAitiSau3Al and fragments in the range of 6 to 12 kb were purified from an agarose gel after 
electrophoresis of the digested DNA. These DNA firagments were then ligated to BamHl 
digested XZAP Express™ vector arms according to manufacturers protocols. Three ligations 

30 were set up to obtain ^proximately 9.8 X 10^ independent clones. All three libraries were 
pooled and amplified according to manufacturer instructions to obtain high-titre (>10^ plaque 
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fonning units/ml) stock for long-tenn storage. The titre of packaged phage iibiaiy was 
ascertained after infection of E. coli XLlBlue-MRF' cells. K coli XLlBlue-MRF' were grown 
overnight in either in LB medium or NZCYM (Chart) containing 10 mM MgS04 and 0.2% 
maltose at 3VC or 30**C, respectively with shaking. Cells were then centrifiiged and 

5 resuspended in 0.5 to 1 volume of 10 mM MgSO^. 200 jxl of Ais £ coli culture was mixed with 
several dilutions of packaged phage library and incubated at 37^C for 1 5 min. To this mixture 
2.S ml of LB top agarose or NZCYM top agarose (maintained at 60°C ) (see Chart) was added 
and plated on LB agar or NCZYM agar (see Chart) present in 82 mm petri dishes. Phage were 
allowed to propagate overnight at IVC to obtain discrete plaques and the phage titre was 

10 determined 

EXAMPLE? 
Screening of Genomic Libraries 
Both XTriplEx™ and AZAP Express™ vectors are phagemid vectors that can be 

IS propagated either as phage or plasmid DNA (after conversion of phage to plasmid). Therefore, 
the genomic libraries constructed in these vectors can be screened either by plaque hybridization 
(screening of lambda form of Ubrary) or by colony hybridization (screening plasmid form of 
library after phage to plasmid conversion). Both vectors are capable of expressing the cloned 
genes and the main diflFerence is the mechanism of excision of plasmid from the phage DNA. 

20 The cloning site in JlTriplEx™ is located within a plasmid which is present in the phage and is 
flanked by loxP site (Figure 1). When XTriplEx™ is introduced into K coli strain BM25.8 
(supplied by Clontech)» the Cre recombinase present in BM2S.8 promotes the excision and 
circularization of plasmid pTriplEx from the phage XTriplEx™ at the loxP sites. The 
mechanism of^cisionofplasmidpBK-CMV from phage XZAP Express™ is different It 

25 requires the assistance of a helper phage such as ExAssist™ (Stratagene) and an K coli strain 
sudi as XLOR (Stratagene). Both pTriplEx and pBK-CMVcan replicate autonomously in £. 
coll. 
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A. Screening Genomic Libraries (Plasmid Form) 

1) Colony Lifts 

A single colony of R coli BM25.8 was inoculated into 5 ml of LB containing 50 
|xg/ml kanamycin, 10 mM MgS04 and 0*1% maltose and grown overnight at 3 PC, 250 ipm. To 
5 200 ^1 of this overnight culture (-4X10' cells) 1 ^1 of phage library (2-5X10^ plaque 
forming units) and 150 ^1 LB broth were added and incubated at 31 ''C for 30 min after which 
400 ^1 of LB broth was added and mcubated at 3PC , 225 rpm for 1 h. This bacterial culture 
was diluted and plated on LB agar contaming 50 fig/ml ampicillin (Sigma Chemical Company, 
St Louis, MO) and kanamycin (Sigma Chemical Company) to obtain 500 to 600 colonies/plate. 

10 The plates were incubated at 37**C for 6 to 7 hrs imtil the colonies became visible. The plates 
were then stored at 4°C for 1 .5 h before placing a Colony/Plaque Screen™ Hybridization 
Transfer Membrane disc (DuPont NEN Research Products, Boston, MA) on the plate in contact 
with bacterial colonies. The transfer of colonies to the membrane was allowed to proceed for 3 to 
5 min The membrane was then lifted and placed on a fiesh LB agar (see Chart) plate containing 

15 200 }ig/ml of chloramphenicol wifli the side e3q)osed to the bacterial colonies facing up. The 
plates containing the membranes were then incubated at 37°C overnight in order to allow fiill 
development of the bacterial colonies. The LB agar plates from which colonies were initially 
lifted were incubated at 37*'C overnight and stored at 4'*C for fiiture use. The following 
morning the membranes containing bacterial colonies were lifted and placed on two sheets of 

20 Whatman 3M (Whatman, Hillsboro, OR) paper saturated with 0.5 N NaOH and left at room 
temperature (RT) for 3 to 6 min to lyse the cells. Additional treatment of membranes was as 
described in the protocol provided by NEN Research Products. 

2) DNA Hybridizations 

25 Membranes were dried overnight before hybridizing to oligonucleotide probes 

prepared using a non-radioactive ECL™ 3'K>ligolabelling and detection system from Amersham 
Life Scimces (Arlington Heights, IL). DNA labelmg, prehybridization and hybridizations were 
performed according to manufacturer's protocols. After hybridization, membranes were washed 
twice at room temperature in 5 X SSC, 0.1% SDS (in a volume equivalent to 2 ml/cm^ of 

30 membrane) for 5 min each followed by two washes at 50*'C in IX SSC, 0.1% SDS (in a volume 
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equivalent to 2 ml/cm^ of membrane) for IS min each. The hybridization signal was tfien 
generated and detected with HyperfQm ECL™ (Amersham) according to manxifactuier^s 
protocols. Membranes were aligned to plates containing bacterial colonies &om v^ch colony 
lifts were perfonned and colonies corresponding to positive signals on X-ray were then isolated 
S and propagated in LB broth. Plasmid DNA's were isolated from these cultures and analyzed by 
restriction enzyme digestions and by DNA sequencing. 

B. Screening Genomic Libraries (Plaque Form) 
1) X Library Plating 

10 K coli XLlBlue-MRF' cells were grown overnight in LB medium (25 ml) 

contaming 10 mM MgS04 and 0^% maltose at iT'Cy 250 rpm. Cells were then centrifuged 
(2^00 X g for 1 0 min) and resuspended in 0.5 volumes of 10 mM MgS04. 500 fil of this E. coli 
culture was mixed with a phage suspension containing 25,000 amplified lambda phage particles 
and incubated at 37°C for 15 min. To this mixture 6.5 ml of NZCYM top agarose (maintained at 

15 60°C) (see Chart) was added and plated on 80 - 100 ml NCZYM agar (see Chart) present in a 
150 mm petridish. Phage were allowed to propagate overnight at 37^C to obtain discrete 
plaques. After overnight growth plates were stored in a refrigerator for 1-2 hr before plaque lifts 
vfctt performed. 

20 2) Ptoque Lift and DNA Hybridizations 

Magna Lift™ nylon membranes (Micron Separations, Inc., Westborough, MA) 
were placed on the agar sur&ce in complete contact with X plaques and transfer of plaques to 
nylon membranes was allowed to proceed for 5 min at RT. After plaque transfer the membrane 
was placed on 2 sheets of Whatman 3M™ (Whatman, Hillsboro, OR) filter paper saturated with 

25 a 0.5 N NaOH, 1.0 M NaCl solution and left for 10 min at RT to denature DNA. Excess 
denaturing solution was removed by blotting briefly on dry Whatman 3M paper. Membranes 
were then transferred to 2 sheets of Whatman 3M™ p^er saturated with 0.5 M Tris-HCl (pH 
8.0), 1.5 M NaCl and left for 5 min to neutralize. Membranes were then briefly washed in 200 - 
500 ml of 2 X SSC, dried by air and baked for 30 - 40 min at 80°C. The membranes were then 

30 probed with labelled DNA. 
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Membranes were prewashed with a 200 - SOO ml solution of S X SSC, 0.5% SDS, 
1 mM EDTA (pH 8.0) for 1 - 2 hr at 42''C with shaking (60 rpm) to get rid of bacterial debris 
from the membranes. The membranes were prehybridized for 1 - 2 hr at 42®C with (in a volume 
equivalent to 0.125 - 0.25 ml/cm^ of membrane) ECL Gold™ buffer (Amershana) containing 0.5 
5 M NaCl and 5% blocking reagent DNA fragments Aat were used as probes were purified from 
agarose gel using a QIAEX n™ gel extraction kit ((Jiagen Inc., Chatsworth, CA) according to 
manu&cturers protocol and labeled using an Ameisham ECL™ direct nucleic acid labeling kit 
(Amersham). Labeled DNA (5-10 ng/ml hybridization solution) was added to the prehybridized 
membranes and the hybridization was allowed to proceed overnight The following day 

10 membranes were washed with shaking (60 rpm) twice at 42^C for 20 mm each time m (m a 

volume equivalent to 2 ml/cm^ of membrane) a bufier containing either 0. 1 (high stringency) or 
0.5 Oow stringency) X SSC, 0.4% SDS and 360 g/1 urea. This was followed by two 5 mm 
washes at room temperature in (in a volume equivalent to 2 ml/cm^ of membrane) 2 X SSC. 
Hybridization signals were generated using the ECL™ nucleic acid detection reagent and 

15 detected using Hyperfilm ECL™ (Amersham). 

Agar plugs which contained plaques corresponding to positive signals on the X- 
ray fihn were taken from the master plates usmg the broad-end of Pasteur pipet Plaques were 
selected by aligning the plates with the x-ray fihn. At this stage, multiple plaques were generally 
taken. Phage particles were eluted firom the agar plugs by soaking in 1 ml SM buffer (Sambrook 

20 et al., supra) overnight The phage eluate was then diluted and plated with fi^shly grown R coli 
XLlBlue-MRF* cells to obtain 100 - 500 plaques per 85 mm NCZYM agar plate. Plaques were 
tiansfenred to Magna Lift nylon membranes as before and probed again using the same probe. 
Single well-isolated plaques corresponding to signals on X - ray film were picked by removing 
agar plugs and eluting the phage by soaking overnight in 0.5 ml SM buffer. 

25 

C. Conversion of X Clones to Plasmid Form 

The lambda clones isolated were converted to plasmid form for fiuther analysis. 
Conversion from the plaque to the plasmid form was accomplished by infecting the plaques mto 
K coli strain BM25.8. The E. coli strain was grown overnight at 3 1 °C, 250 rpm in LB broth 
30 containing 10 mM MgS04 and 0.2% maltose until the ODgoo reached 1.1 - 1 .4. Ten milliliters of 
the overnight culture was removed and mixed with 100 ^1 of 1 M MgCl2. A 200 ^1 volume of 
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cells was removed, mixed with ISO ^l of eluted phage suspension and incubated at 3 1 ''C for 30 
min. LB broth (400 ^1) was added to the tube and incubation was continued at31''C for 1 hr 
with shaking, 250 rpm. 1 - 1 0 ^1 of the infected cell suspension was plated on LB agar 
containing 100 (ig/ml ampicillin (Sigma, St Louis, MO). Well-isolated colonies were picked 
5 and grown overnight in S ml LB broth contaming 1 00 ^g/ml ampicillin at 250 rpm. 
Plasmid DNA was isolated from these cultures and analyzed. To convert the XZAP Express™ 
vector to plasmid form K coli strains XLlBlue-MRF' and XLOR were used. The conversion 
was performed according to the manu&cturer*s (Stratagene) protocols for single-plaque 
excision. 

10 

EXAMPLE 4 
Transformation of C tropicalis H5343 ara' 
A. Transformation of C tropicalis H5343 by Electroporation 

5 ml of YEPD was inoculated with C tropicalis H5343 ura- from a frozen 

IS stock and incubated overnight on a New Brunswick shaker at SO'^C and 170 rpuL The next day, 
1 0 ^1 of the overnight culture was inoculated into 1 00 ml YEPD and growth was continued at 
30''C, 170 rpm. The following day die cells were harvested at an ODmo of LO and the cell 
pellet was washed one time with sterile ice-cold water. The cells were resuspended in ice-cold 
sterile 35 % Polyethylene glycol (4,000 MW) to a density of 5x10* cells/ml. A 0.1 ml volume of 

20 cells were utilized for each electroporation. The following electroporation protocol was 

followed: 1.0 //g of transforming DNA was added to 0.1 ml cells, along with S /zg denatured, 
sheared calf thymus DNA and the mixture was allowed to incubate on ice for 15 min. The cell 
solution was then transferred to an ice-cold 0.2 cm electroporation cuvette, tapped to make sure 
the solution was on the bottom of the cuvette and electroporated. The cells were electroporated 

25 using an Invitrogen electroporator (Carlsbad, CA) at 450 Volts, 200 Ohms and 250 /zF. 

Following electroporation, 0.9 ml SOS media (IM Sorbitol, 30% YEPD, 10 mM CaCy was 
added to the suspension. The resulting culture was grown for I hr at 30*'C, 170 rpm. Following 
die incubation, the cells were pelleted by cmtrifrigation at 1500 xg for 5 min. The 
electroporated cells were resuspended in 0.2 ml of IM sorbitol and plated on synthetic complete 

30 media minus uracil (SC - uracil) (Nelson, supra). In some cases the electroporated cells were 
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plated directly onto SC - uracil. Growth of tiansfonnants was monitored for S days. After three 
days, several transformants were picked and transferred to SC-uracil plates for genomic DNA 
preparation and screening. 

S B. Transformation of C tropicalis Using Lithium Acetate 

The following protocol was used to transform C tropicalis in accordance with the 
procedures described in Current Protocols in Molecular Biology^ Supplement 5, 13.7.1 (1989), 
incorporated herein by reference. 

S ml of YEPD was inoculated with C. tropicalis HS343 ura- fiom a firozen stock 

10 and incubated overnight on a New Brunswick shaker at 30''C and 170 rpm. The next day, 10 |il 
of the overnight culture was inoculated into SO ml YEPD and growth was continued at 30*'C, 170 
rpm. Thefollowingday the cells were harvested at an ODfioo of 1.0. The culture was transferred 
to a 50 ml polypropylene tube and centtifiiged at 1000 X g for 10 miiL The cell pellet was 
resuspended in 10 ml sterile TE (1 OmM Tris-Cl and ImM EDTA, pH 8.0). The cells were again 

IS centrifiiged at 1000 X g for 10 min and the cell pellet was resuspended in 10 ml of a sterile 
lithiimi acetate solution [LiAc ( 0.1 M lithium acetate, 10 mM Tris-Cl, pH 8.0, 1 mM EDTA)]. 
Following centrifugation at 1000 X g for 10 miiL, the pellet was resuspended in 0.5 ml LiAc. 
This solution was incubated for one hour at 30^C while shaking gently at 50 rpm. A 0.1 ml 
aliquot of this suspension was incubated with 5 ^g of transforming DNA at SO^'C with no 

20 shaking for 30 min. A 0.7 ml PEG solution (40 % wtArol polyethylene glycol 3340, 0.1 M 

lithium acetate, 10 mM Tris-Cl, pH 8.0, 1 mM EDTA) was added and mcubated at 30''C for 45 
miiL The tubes were then placed at 42*'C for 5 min. A 0.2 ml aliquot was plated on synthetic 
complete media minus uracil (SC - uracil) (Kaiser et al. Methods in Yeast Genetics^ Cold Spring 
Harbor Laboratory Press, USA, 1994, incorporated herein by reference). Growth of 

25 transformants was monitored for 5 days. After three days, several transformants were picked and 
transferred to SC-uracil plates for genomic DNA preparation and screening. 

EXAMPLES 
Plasmid DNA Isolation 

30 Plasmid DNA were isolated from E. coli cultures using Qiagen plasmid isolation 

kit (Qiagen Inc., Chatsworth, CA) according to manufacturer's instructions. 
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EXAMPLES 
DNA Sequencing and Analysis 
DNA sequencing was performed at Sequetech Corporation (Mountain View, CA) 
S iising AppUed Biosyst»ns automated sequencer (Perldn Elrner, Foster City, CA). DNA 

sequences were analyzed with MacVector and Gene Works software packages (Oxford Molecular 
Group, Campbell, CA). 

EXAMPLE 7 

10 PGR Protocols 

PCR amplification was carried out in a Perkin Elmer Thermocycler using the 
AmpUro^GoId enzyme (Pericin Elmer Cetus, Foster City, CA) kit accordmg to manu&cturer's 
specifications. Following successful amplification, in some cases, the products were digested 
with the ^ropriate enzymes and gel purified using QiaexII (Qiagen, Chatsworth, CA) as per 
15 manufacturer instructions. In specific cases the Ultma Taq polymerase (Perkin Elmer Cetus, 
Foster City, CA) or the Expand Hi-Fi Taq polymerase (Boehringer Mannheim, Indianapolis, IN) 
were used per manu&cturer*s recommendations or as defined in Table 3. 

Table 3. PCR amplification conditions used with different primer combinations. 

20 



PRIMER 
COMBINATION 


Taq 


TEMPLATE 
DENATURING 
CONDITION 


ANNEALING 
TEMP/TIME 


EXTENSION 
TEMP/TIME 


CYCLE 
Number 


3674-41-1/41-2/41-4 
+•3674-41-4 


AmpH- 
Taq Gold 


94 C/30 sec 


55 C/30 sec 


72 C/l min 


30 


URA Prim^ la 
URA Primer lb 


AmpH- 
ro? Gold 


95 ai min 


70C/1 min 


72C/2min 


35 


URA Primer 2a 
URA Primer 2b 


Ampli- 
Taq Gold 


95 ai min 


70C/I min 


72C/2min 


35 


CYF2M\ 


Ampli- 
Taq Gold 


95 ai min 


70 ai min 


72C/2min 


35 


CYP3MI 
CYF3M2 


Ultma Taq 


95C/1 min 


70C/1 min 


72 ai min 


30 


CPRMl 


Expand 

Hi-Fi 

Taq 


94C/15sec . 
94C/lSsec 


SO C/30 sec 
SO C/30 sec 


68 C/3 min 
68C/3min 
+20 sec/cycle 


10 
15 
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CYPSMl 


Expand 


94C/lSsec 


SO C/30sec 


68C/3 min 


10 


CYPSMl 


Hi-Fi 


94C/lSsec 


SO 030 sec 


6gC/3 min 


15 




Tag 






+20 sec/cycle 





Table 4 below contains a list of primers (SEQ ID NOS: 1-35) used for PCR amplification to 
5 construct gene integration vectors or to generate probes for gene detection and isolation. 

Table 4. Primer table for PCR amplification to construct gene integration vectors, to generate 
probes for gene isolation and detection and to obtain DNA sequence of constructs. (A- 
deoxyadenosine triphosphate [dATP], G- deoxyguanosine triphosphate [dGTP], C- 
10 deoxycytosine triphosphate [dCTP], T- deoxythymidine triphosphate [dTTP], Y- dCTP or dTTP, 
R- dATP or dGTP, W- dATP or dTTP,M- dATP or dCTP. N- dATP or dCTP or dGTP or 
dTTP). 



25 



Target 


Patent 
ramcr 
Name 


Lab 
inner 
Name 


Sequence (5* to 3') 


PCR 

rrvUIICK oBB 












CYP52A2A 


CYP2A#1 


36S9»72M 


CCTTAA 7Tyli4 ATCK^CuAAuLrUOAOA 
(SEQroNO: 1) 


Dp 




CYP2A#2 


3659-72N 


CC7X147T^GCATAAGCTTGCTCGAG 
(SEQ ID Na 2) 




















CCl lAAi iAAAKAJK^AJxi UUOAAwii U 

GAGTG 
(SEQ ID NO: 3) 


op 




CYP3A#2 


365^72? 


CC7T^47Ti4i4TCGCACTACGGTTATTG 

GTATCAG 

(SEQ ID NO: 4) 














CYP52A5A 


CYP5A#1 


3659-72K 


CCTXl^TTAfrCAAAGTACGTTCAGGC 
GO 

(SEQ ID NO: 5) 


3298 bp 




CYP5A*2 


3659-72L 


CCTTillTTi^^GGCAGACAACAACTTG 

GCAAAGTC 

(SEQ ID NO: 6) 














CPRB 


CPRB#1 


3698-20A 


CC7Ti4i47Ti4^GAGGTCGTrGGTTGAGT 
TTTC 

(SEQ ID NO: 7) 


3266 bp 




CPRB«2 


3698-20B 


CCIXi47Til4TrGATAATGACGrrGCG 
GG 

(SEQ ID NO: 8) 














URA3A 


URA Primer 
la 


3698.7C 


^GCCGCCCCGGAGTCCAAAAAGACC 

AACCTCTG 

(SEQ ID NO: 9) 


956 bp 




URA Primer 
lb 


3698-7D 


CC7T^7TA4TACGTGGATACCTTCAA 

GCAAGTG 

(SEQ ID NO: 10) 
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10 



15 



VRAM 


URA Primer 
2a 


3698-7A 


CCJTXtf ITA^GCTCACGAGTnTGGGA 

TTTTCGAG 

(SEQIDNO: 11) 


750 hp 




URA Primer 
2b 


3698-7B 


CGC7TrXA4CCGCAGAGGTTGGTCTT 

TTTGGACrC 

(SEQIDNO: 12) 




















GGGTTTAAAC - Pmg 1 restriction site 
(SEQIDNO: 13) 










ACGCGCGCC - Ascl restriction site 
(SEQIDNO: 14) 










CCTTAATTAA - Pacl restriction site 
(SEQIDNO: 15) 














CPR 


FMNl 


3674-4M 


TCYCAAACWGGTACWGCWGAA 
(SEQIDNO: 16) 




CFR 


FMN2 


3674-41-2 


GGnrOGGTAAYTCWAClTAT 
(SEQIDNO: 17) 




CPR 


FAD 


3674-41-3 


CGTTATTAYTCYATTTCTTC 
(SEQIDNO: 18) 




CPR 


NADPH 


3674^ M 


GCMACACCRGTACCT(K5ACC 
(SEQIDNO: 19) 




CPR 


PRK1.F3 


PRK1.F3 


ATCXXAATCGTAATCAGC 
(SEQIDNO: 20) 




CPR 


PRK1J5 


PRK1.F5 


ACTTGTCTT(X3TTTAGCA 
(SEQIDNO: 21) 




CPR 


PRK4.R20 


PRK4.R20 


CTACGTCTGT<}GTGATGC 
(SEQIDNO: 22) 




CYP 


UCupl 


UCupl 


CGNGAYACNACNGCNGG 
(SEQIDNO: 23) 


• 


CYP 


UCup2 


UClip2 


AGRGAYACNACNGCNGG 
(SEQIDNO: 24) 




CYP 


UCdownl 


UGdownl 


AGNGCRAAYTGYTC5N(X: 
(SEQIDNO: 25) 




CYP 


UCdown2 


UCdownl 


YAANGCRAAYTGYTGNCC 
(SEQIDNO: 26) 




CYP 


HemeBl 


HemeBl 


ATTCAACGGTGGTCCAAGAATCrrGTT 
TGG 

(SEQIDNO: 27) 




CYP 


23^P 


23,5P 


GAGCTATGTTGAGACCACAGTTTGC 
(SEQIDNO: 28) 




CYP 


23.5M . 


23t5M 


CTTCAGTrAAAGCAAATTGTTTCKjCC 
(SEQIDNO: 29) 




pTriplEx 
vector 


Triplex5* 


Triplex5* 


CTCGGGAAOCGCGCCATTGTGTTGG 
(SEQIDNO: 30) 




pTrtplEx 
vector 


Triplcx3* 


Triplcx3' 


TAATACGACTCACTATAGGGCGAAT 
TGGC 

(SEQIDNO: 31) 






uypj^ 




1 OK 1 1 CAAACUA 1 Ci Z rC 1 Uvr 
(SEQIDNO: 32) 




CYP 


Cyp52b 


Cyp52b 


GGACCGGCGTTAAACSGG 
(SEQIDNO: 33) 




CYP 


Cyp52c 


Cyp52c 


CATAGTCGWATYATGCTTAGACC 
(SEQIDNO: 34) 




CYP 


CypS2d 


CypS2d 


GGACCACCATTGAATGG 
(SEQIDNO: 35) 





20 



25 



30 
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EXAMPLES 

Yeast Colony FCR Procedure for Confirmation of Gene 
Int^ration into the Genome of C iropicalis 

5 Single yeast colonies were removed from the sur&ce of transformation plates, 

suspended in SO //l of spheroplasting buff^ (SOmM KCl, IQmM Tris*HCl, pH 8.3, 1 .0 mg/ml 
Zymolyase, 5% glycerol) and incubated at ST^'C for 30 min. Following incubation, the solution 
was heated for 10 min at 95 to lyse the cells. Five fil of this solution was used as a template in 
PGR. Expand Hi-Fi Tag polymerase (Boehringer Mannheim, Indianapolis, IN) was used in PGR 
10 coupled with a gene-specific primer (gene to be integrated) and a URA3 prima*. If integration 
did occur, amplification would yield a PGR product of predicted size confirming the presence of 
an integrated gene. 

EXAMPLE 9 

IS Fermentation Method for Gene Induction Studies 

A fermentor was charged with a semi-synthetic growth medium having the 
composition 75 g/1 glucose (anhydrous), 6.7 g/1 Yeast Nitrogen Base (Difco Laboratories), 3 g/1 
yeast extract, 3 g/1 ammonium sul&te, 2 g/1 nionopotassium phosphate, 0.5 g/1 sodium chloride. 
Gomponents were made as concentrated solutions for autoclaving then added to the fermentor 

20 upon cooling: final pH approximately 52. Thischarge was inoculated with 5-10% of an 

overnight culture of C tropicalis ATGG 20962 prepared in YM medium (Difco Laboratories) as 
described in the methods of Examples 17 and 20 of US Patent 5,254,466, which is incorporated 
herein by reference. C iropicalis ATGG 20962 is a POX 4 and I^OX 5 disrupted C tropicalis 
ATGG 20336. Air and agitation were supplied to maintain die dissolved oxygen at greater than 

25 about 40% of saturation versus air. The pH was maintained at about 5.0 to 8.5 by the addition of 
5N caustic soda on pH control. Both a fatty acid feedstream (commercial oleic acid in this 
example) having a typical composition: .2.4% Ci^; 0.7% Cu^; 4.6% Cis* 5.7% C,6.|; 5.7% Gp^i; 
1.0% Cis; 69.9% G,8.,; 8.8% Gi^j; 0.30% G,8j. 0.90% Gjon and a glucose co-substrate feed were 
added in a feedbatch mode beginning near the end of exponential growth. Caustic was added on 

30 pH control during the bioconversion of fatty acids to diacids to maintain the pH in the desired 
range. Typically, samples for gene induction studies were collected just prior to starting the fiitty 
acid feed and over the first 10 hours of bioconversion. Detennination of fiitty acid and diacid 
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contmt was determined by a standard methyl ^ter protocol using gas liquid chromatogr^hy 
(GLC). Gene induction was measured using the QC-RT-PCR protocol descrited in this 
application. 

5 EXAMPLE 10 

RNA Preparation 

The first step of this protocol involves the isolation of total ceUular RNA from 
cultures of C. tropicalis. The cellular RNA was isolated using the Qiagen RNeasy Mini Kit 
(Qiagen Inc., Chatsworth, CA) as follows: 2 ml samples of C. tropicalis cultures were collected 

10 firom the fermentor in a standard 2 ml screw capped Eppendorf style tubes at various times before 
and after the addition of the &tty add or alkane substrate. Cell samples were immediately bozea 
in liquid nitrogen or a dry-ice/alcohol bath after their harvesting firom the fermentor. To isolate 
total RNA fix)m the samples, the tubes were allowed to thaw on ice and the cells pelleted by 
centrifugation in a microfiige for 5 minutes (min) at 4°C and the supernatant was discarded while 

15 keeping the pellet ice-cold- The microfuge tubes were filled 2/3 fiill with ice-cold Zirconia/Silica 
beads (0.5 mm diameter, Biospec Products, Bartlesville, OK) and the tube filled to the top with 
ice-cold RLT* lysis buffer (♦ buffer mcluded with the Qiagen RNeasy Mini Kit). Cell rupture 
was achieved by placing the samples in a mini bead beater (Biospec Products, Bartlesville, OK) 
and immediately homogenized at fiiU speed for 2.5 min. The samples were allowed to cool in a 

20 ice water bath for 1 minute and the homogenizatioii/cool process repeated two more times for a 
total of 7.5 min homogenization time in the beadbeater. The homogenized cells samples were 
microfiiged at fiiil speed for 1 0 min and 700 ^1 of the RNA containing supernatant removed and 
transferred to a new eppendorf tube. 700 ^1 of 70% ethanol was added to each sample followed 
by mixing by inv^on. This and all subsequent steps were performed at room temperature. 

25 Seven hundred microliters of each ethanol treated sample were transferred to a Qiagen RNeasy 
spin column, followed by centrifugation at 8,000 x g for 1 5 sec. The flow through was 
discarded and the column reloaded with the remaining sample (700 ^1) and re-centrifiiged at 
8,000 X g for 15 sec. The column was washed once with 700 \il of buffer RWl*, and 
centrifiiged at 8,000 x g for 1 5 sec and the flow through discarded The column was placed in a 

30 new 2 ml collection tube and washed with 500 ^il of RPE* buffer and the flow through discarded. 
The RPE* wash was rq)eated with centrifugation at 8,000 xg for 2 min and the flow through 
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discarded. The spin column was transferred to a new L5 ml collection tube and 100 ^1 of RNase 
fiee water added to the column followed by centrifugation at 8,000 x g for 15 seconds. An 
additional 75 |il of RNase free water was added to the column followed by centrifugation at 
8,000 X g for 2 min. RNA eluted in the wato* flow through was collected for ftirtibar purification. 
S The RNA eluate was then treated to remove contaminating DNA. Twenty 

microliters of lOX DNase I buffer (0.5 M tris (pH 7.5), 50 mM CaCl;, 100 mM MgClj), 10 ^1 of 
RNase-fiee DNase I (2 Units/^1, Ambion Inc., Austin, Texas) and 40 units Rnasin (Promega 
Corporation, Madison, Wisconsin) were added to the RNA sample. The mixture was then 
incubated at 37*^0 for 15 to 30 min. Samples were placed on ice and 250 ^1 Lysis buffer RLT* 

10 and 250 ^l ethanol (200 prooQ added. The samples were then mixed by inversion. The samples 
were transfened to Qiagra RNeasy spin columns and centrifiiged at 8,000 x g for 15 sec and the 
flow through discarded. Columns were placed in new 2 ml collection tubes and washed twice 
with 500 ^l of RPE* wash buffer and the flow through discarded. Columns were transferred to 
new 1.5 ml eppendorf tubes and RNA was eluated by the addition of 100 |il of DEPC treated 

15 water followed by centrifugation at 8,000 x g for 15 sec. Residual RNA was collected by adding 
an additional 50 \il of RNase fiee water to the spin colimm followed by centrifugation at full 
speed for 2 min. 10 ^1 of the RNA preparation was removed and quantified by the (A^tonn) 
method RNA was stored at 

-70*'C. Yields were found to be 30-100 |ig total RNA per 2.0 ml of fermentation broth. 

20 

PXAIN^PLE^ 

Quantitative Competitive Reverse Transcription Polymerase 
Chain Reaction (QC-RT-PCR) Protocol 

25 QC-RT-PCR is a technique used to quantitate the amount of a specific RNA in a 

RNA sample. This technique employs the synthesis of a specific DNA molecule that is 
complementary to an RNA molecule in the original sample by reverse transcription and its 
subsequent amplification by polymerase chain reaction. By the addition of various amounts of a 
competitor RNA molecule to the sample one can determine the concentration of the RNA 

30 molecule of interest (m this case the mRNA transcripts of the CYP and CPR gmes). The levels 
of specific mRNA transcripts were assayed over time in response to the addition of &tty acid 
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and/or alkane substrates to the growth medium of fermentation grown C tropicalis cultures for 
the identification and characterization of the genes involved in the oxidation of these substrates. 
This approach can be used to identify the CYP and CPR genes involved in the oxidation of any 
given substrate based upon their transcriptional r^;ulation. 

5 

A. Primer Design 

The first requirement for QC-RT-PCR is the design of the primer pairs to be used 
in the reverse transcription and subsequent PCR reactions. These primers need to be unique and 
specific to the gene of interest As there is a fiunily of genetically similar CYP genes present in 

10 C. tropicalis 20336, care had to be taken to design primer pairs that would be discriminating and 
only amplify the gene of interest, in this example the CYP52A5 gene. In this manner, unique 
primers directed to substantially non-homologous (aka variable) regions within target inembers 
of a gene family are constructed. What constitutes substantially non-homologous regions is 
determined on a case by case basis. Such unique primers should be specific enough to anneal the 

IS non-homologous region of the target gene without annealing to other non-target members of the 
gene family. By comparing the known sequences of the members of a gene family, non- 
homologous regions are identified and unique primers are constructed which will anneal to those 
regions. It is contemplated that non-homologous regions herein would typically exhibit less than 
about 85% homology but can be more homologous depending on the positions which are 

20 conserved and stringency of the reaction. Aft^ conducting PCR, it may be helpfiil to check the 
reaction product to assure it represents the unique taiget gene product If not, the reaction 
conditions can be altered in terms of stringency to focus the reaction to the desired taiget. 
Alternatively a new primer or new non-homologous r^on can be chosen. Due to the high level 
of homology between the genes of the CYP52A &mily, the most variable S prime region of the 

25 CYP52AS coding sequence was targeted for the design of the primer pairs. In Figure 3, a portion 
of the 5 prime coding region for the CYP52ASA (SEQ ID NO: 36) allele of C tropicalis 20336 is 
shown. The boxed sequences in Figure 3 are the sequences of the forward and backwards 
primers (SEQ ID NOS: 47 and 48) used to quantitate expression of both alleles of this gene. The 
actual reverse primer (SEQ ID NO: 48) contains one less adenine than that shown in Figure 3. 

30 Primers used to measure the expression of specific C tropicalis 20336 genes using the QC-RT- 
PCR protocol are listed in Table 5 (SEQ ID NOS: 37-58). 

-43- 



wo 00/20566 



Table 5. Primer used to measure C. tropicalis gene expression in 
the QC-RT-PCR ructions. 



Primer 
Name 


Directioii 


Target 


Sequent 


3737-89F 


F 


CYP52AJA 


CCGATGAAGTrrrCGACGAGTACX:C 
(SEQIDNO:37) 


3737-89B 


B 


CYP52A2A 


AAGGCTTTAACGTGTCCAATCTGGTC 
(SEQIDNO: 38) 


alk2aFl 


F 


CYP52A2A 


ATrATCGCCACATACTTCACCAAATGG 
(SEQ ID NO: 39) 


aUc2aB5 


8 


CYP32A2A 


CGAGATCGTGGATACGCTGGAGTG 
(SEQIDNO: 40) 


#,901*I r ©"J 


F 




GCCACTCGGTAACTTTGTCAGGGAC 
(SEQ ID NO: 41) 




D 
13 




C A TXn A A PTfi A fiTAGCC AAAACAGCC 
(SEQIDNO: 42) 




r 


CYPS7A ^A 

•ft 


rPTAPfiTTTfifiTATCGCTACTCCGTTG 
(SEQ ID NO: 43) 


3737-50B 


8 


CYP52A3A 
CYP52A3B 


TTTCCAGCCAGCACCGTCCAAG 
(SEO ID NO* 441 


5737- 175F 


F 


CYP52D4A 


GCAGAGCCGATCTATGTTGCGTCC 
(SEQIDNO: 45) 


3737- 175B 


B 


CYPS2D4A 


TCATTGAATGCrTCCAGGAACCTCG 
(SEQIDNO: 46) 


7581-5y7-F 


p 


CYPS2A5A& 
CYP52A5B 


AAGAGGGCAGGGCTCAAGAG 
(SEQ ID NO: 47) 




B 


CYPS2A5A& 
CYPS2A5B 


TCCATGTGAAGATCCCATCAC 
(SEQ ID NO: 48) 


4P-2 


F 


CYP52A8A 


CTTGAAGGCCGTGTTGAACG 
(SEQ ID NO: 49) 


4M-1 


B 


CYP52A8A 


CAGGATrTGTCTGAGTTGCCG 
(SEQ ID NO: 50) 


3737-52F 


F 


P0X4A & 
P0X4B 


CCATTGCCTTGAGATACGCCATTGGTAG 
(SEQ ID NO: 51) 


3737-528 


8 


P0X4A & 
POX4B 


AGCCTTGGTGTCGTi'Ci'l 1 ICAACGG 
(SEOa)NO:52) 


3737-53F 


F 


P0X5A 


TTGGGTITGTrTGTrTCCTGTGTCCG 
(SEQIDNO: 53) 


3737-538 


8 


POX5A 


CCnTGACCTTCAATCTGGCGTAGACG 
(SEQIDNO: 54) 


F33 


F 


CPRA 


GGnTGCTGAATACGCTGAAGGTGATG 
(SEQIDNO: 55) 


863 


8 


CPRA 


TGGAGCTGAACAACTCTCTCGTCTCGG 
(SEQ ID NO: 56) 


3737.133F 


F 


CPRA& 
CPRB 


TTCCTCAACACGGACAGCGG 
(SEQIDNO: 57) 


3737-1338 


8 


CPRA& 
CPRB 


AGTCAACCAGGTGTGGAACTCGTC 
(SEQIDNO: 58) 



F=Forward B=Backward 
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B. Design and Synthesis of the Competitor DNA T^plate 

The competitor RNA is synthesized in vitro from a competitor DNA template that 
has the T7 polymerase promoter and preferably canies a small deletion of e.g., about 10 to 25 
nucleotides relative to the native target RNA sequence. The DNA template for the in-vitro 
5 synth^is of the competitor RNA is synthesized using PGR primers that are between 46 and 60 
nucleotides in length. In this example, the primer pairs for the synthesis of the CYP52A5 
competitor DNA are shown in Tables 6 and 7 (SEQ ID NOS: 59 AND 60). 



10 Table 6. Forward and Reverse primers used to syndiesize the competitor RNA template for 
the QC-RT-PCR measurement of CYP52A5A gene cjqjression. 



Foiwaid Primer 


CrF52A5A 


GGATCCTAATACGACTCACTATAGGGAGGA 

AGAGGGCAGGGCTCAAGAG 

(SEQ ID NO: 59) 


Reverse Primer 


CYP52A5A 


TCCATGTGAAGATCCCATCACGAGTGTGCC 

TCTTGCCCAAAG 

(SEQ ID NO: 60) 



15 



Table 7. Primers for the synthesis of the QC-RT-PCR competitor RNA templates 



Primer 
Name 


Direetion 


Target 


Sequence 5'-3' 


3737-890 


F 


CYP52A1A 


GGATCCTAATACXjACTCACTATAGGGAGGCCGATG 

AAGTTTTCGACGAGTACX:C 

(SEQ ID NO: 61) 


3737-89D 


B 


CYP52AJA 


AAGGCTITAACGTGTCCAATCTGGTC 
AACATAGCTCTGGAGTGCTrCCAACC 
(SEQ ID NO: 62) 


7581-137-A 


F 


CYP52A2A 


GGATCCTAATACGACTCACTATAGGGAGGATTATC 

GCCACATACTTCACCAAATGG 

(SEQ ID NO: 63) 


7S81-137-B 


B 


CYP52A2A 


CQAGATCGTGGATACGCTGGAGTGCGTCGCTCTTC 

TTCnCAACAATTCAAG 

(SEQ n> NO: 64) 


7581.137.D 


B 


CYP52A3A 


CATTGAACTGAGTAGCCAAAACAGCCCATGGTTTC 

AATCAATGGGAGGC 

(SEQ ID NO: 65) 


7581.137-C 


F 


CYP52A3A 


GGATCCTAATACGACTCACTATAGGGAGGGCCACT 

CGGTAACTTTGTCAGGGAC 

(SEQIDNG: 66) 
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3737-50-D 


F 


CYP52A3A 
CTFjIAjB 


GGATCCTAATACGACTCACTATAGGGAGGCCTACG 

nTGGTATCGCTACTCCGTTG 

(ScQ ID NO: 67) 


3737-50-C 


B 


CYP52A3A 

CirjjAJB 


TTTCCAGCCAGCACCGTCCAAGCAACAAGGAGTAC 
AAGAAATCGTGTC 

(SEQ ID NO: os) 


3737-175C 


F 


CYP52D4A 


GGATCCTAATACGACTCACTATAGGGAGGGCAGAG 

CCXjATCTATGTTGCGTCC 

(SEQ ID NO: 69) 


3737-1750 


B 


CYP52D4A 


TCATTGAATGCTTCCAGGAACCTCGCCACATCCATC 

GAGAACCGG 

(SEQ ID NO: 70) 


7581-97.A 


F 


CYP52A5A 
& 

CYP52A5B 


GGATCCTAATACGACTCACTATAGGGAGGAAGAGG 

GCAGGGCTCAAGAG 

(SEQ ID NO: 59) 


7581-97.B 


B 


CYP52A5A 
CYP52ASB 


TCCATGTGAAGATCCCATCACGAGTGTGC(JrCTT(jC 

CCAAAG 

(SEQ ID NO: 60) 


4P.2n7 


F 


CYP52ASA 


GGATCCTAATACGACTCACTATAGGGAGGCTTGAA 

G(jCCGTGTTGAACG 

(SEQ ID NO: 71) 


4M-3/4M-1 


B 


CYP32A8A 


CAC}GATTTGTCTGAGTTGCCGCCTGATCAAGATAG 

GATCCTTGCCG 

(SEQ ID NO: 72) 


3737-26-D 


F 


CPRA 


GGATCCTAATACGACTCACTATAGGGAGGGGnTG 

CTGAATACGCTGAAGGTGATG 

(SEQ ID NO: 73) 


3737-26-C 


B 


CPRA 


TGGACKnXiAACrAACTCnCTCGTCTCGGGTGGTC^ 

ATGGACCCTTGGTCAAG 

(SEQ ID NO: 74) 


3737-133C 


F 


CPRA& 
CPRB 


GGATCCTAATACGACTCA(JrATAGGGAGG1T(XTC 

AACACGGACAGCGG 

(SEQ ID NO: 75) 


3737-133D 


B 


CPRA& 
CPRB 


AGTCAACCAGGTGTGGAACTCGTCGGTGGCAACAA 

TGAAAAACACCAAG 

(SEQ ID NO: 76) 


3737-52-C 


F 


P0X4A & 
POX4B 


GGATCCTAATACGACrrCACTATAGGGAGGCCATTG 

C<nTGAGATACGCCATTGGTAG 

(SEQ ID NO: 77) 


3737-52-D 


B 


POX4A A 
POX4B 


AGCCTTGGTGTCGTTCl 1 IICAACGGAAGGTGGTCT 
CXjATGGTGTGTTCAACC 

WU. to) 


3737-53-C 


F 


POX5A 


GGATCCTAATACGACTCACTATACiGGAGGTTGGGT 

TTGTTTGTTTCCTGTGTCCG 

(SEQ ID NO: 79) 


3737-53.D 


B 


POX3A 


CCTTTGACCTTCAAT(JrGGCGTAGACGCAGCA<XA 

CCGATCCACCACTTG 

(SEQ ID NO: 80) 



F»Fon¥ard B=Backwoid 
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The forward prima' (SEQ ID NO: 59) contains the T7 promoter cons»sus sequence 
"GGATCCTAATACGA CTCACTATAGGG AGG" fused to the primer 7581-97-F sequence 
(SEQ ID NO: 47). The Revorse Primer (SEQ ID NO: 60) contains the sequence of primer 7581- 
97M (SEQ ID NO: 48) followed by the 20 bases of upstream sequence with a 1 8 base pair 

5 deletion between the two blocks of the CYP52A5 sequence. The forward primer was used with 
the corresponding reverse primer to synthesize the competitor DNA template. The primer pairs 
were combined in a standard Taq Gold polymerase PGR reaction according to the manufacture's 
recommended conditions (Perkin-Ehner/Applied Biosystems, Foster City, CA). The PGR 
reaction mix contained a final concentration of 250 nM each primer and 10 ng C. tropicalis 

10 chromosomal DNA for template. The reaction mixture was placed in a 

diermocycler for 25 to 35 cycles using the highest annealing temperature possible during the 
PGR reactions to assure a homogeneous PGR product (in this case 62**G). The PGR products 
were either gel purified or filtered purified to remove un-incoiporated nucleotides and primers. 
Hie competitor tmqplate DNA was then quantified using the (A26(V28o ) method Primers used in 

15 QC-RT-PCR »q>eriments for the synthesis of various competitive DNA templates are listed m 
Table 7 (SEQ ID NOS: 61-80). 



C. Synthesis of the Competitor RNA 

Competitor template DNA was transcribed In-Vitro to make the competitor RNA 
20 using the Megascript T7 kit from Ambion Biosciences (Ambion Inc., Austin, Texas). 250 
nanograms (ng) of competitor DNA template and the in-vitro transcription reagents are mixed 
according to the directions provided by the manufacturer. The reaction mixture was incubated 
for 4 hours at 37"C. The resulting RNA preparations were then checked by gel electrophoresis 
for the conditions giving the highest yields and quality of competitor RNA. This often required 
25 optimization according to the manufacturer's specifications. The DNA template was then 
removed using DNase I as desaibed m the Ambion kit. The RNA competitor was then 
quantified by the (Azeooso) metiiod. Serial dilution's of the RNA (1 ng/^1 to 1 femtogram (fg)/Ml) 
were made for use in tiie C^-RT-PGR reactions and the original stocks stored at -70*'G. 



30 
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D. QC-RT-PCR Reactions 

QC-RT-PCR reactions were performed using rTth polymerase fiom P^ldn- 
Elmer(Peridn-EImer/Appiied Biosystems, Foster City, CA) according to the manufacturer's 
recommended conditions. The reverse transcription reaction was performed in a 10 ^1 volume 
5 with a final concentrations of 200 |aM for each dNTP, 1 25 units rTth polymerase, 1 .0 mM 
MnCl„ IX of the lOX buffer supplied with the Enzyme torn the manufacturer, 
100 ng of total RNA isolated from a fermentor grown culture of C. tropicalis and 1.25 [iM of the 
appropriate reverse primer. To quantitate CYP52A5 e3q)ression in C. tropicalis an appropriate 
reverse prim^ was 7S81-97M (SEQ ID NO: 48). Several reaction mixes were prepared for each 

10 RNA sample characterized. To quantitate CYP52A5 expression a series of 8 to 12 of the 

previously described QC-RT-PCR reaction mixes were aliquoted to different reaction tubes. To 
each tube 1 (xl of a serial dilution containing from 100 pg tolOO fg CYP52A5 conq)etitor RNA 
per ^1 was added bringing the final reaction mixtures up to the final volume of 1 0 fiL The QC- 
RT-PCR reaction mixtures were mixed and incubated at 70°C for 1 S min accordmg to tiie 

IS inaniifiicturer*s recommended times for reverse transoiption to occur. At the completion of the 
IS minute mcubation, the sample temperature was reduced to 4''C to stop the reaction and 40 fil 
of the PCR reaction mix added to the reaction to bring the total volume up to 50 jil. The PCR 
reaction mix consists of an aqueous solution containing 0.3125 ^^^ of the forward primer 7581- 
97F (SEQ ID NO: 47), 3.125 mM MgCl, and IX chelating buffer supplied witii tiie enzyme firom 

20 Perkin-Ehner. The reaction mixtures were placed in a thermocycler (Perkin-Ehner GeneAmp 
PCR System 2400, Perkin-Ehner/Applied Biosystems, Foster City, CA ) and the following PCR 
cycle performed: 94"C for 1 min. followed by 94*^ for 10 seconds followed by 58^ for 40 
seconds for 17 to 22 cycles. The PCR reaction was completed with a final mcubation at S8''C for 
2 min followed by 4°C. In some reactions \^ere no detectable PCR products were produced the 

25 samples were retumed the thermocycler for additional cycles, this process was repeated until 
enough PCR products were produced to quantify usmg HPLC. The number of cycles necessary 
to produce enough PCR product is a fimction of the amount of die target mRNA in the 100 ng of 
total cellular RNA. In cultures where the CYP52A5 gene is highly e;q>ressed there is sufBcirat 
CYP52A5 mRNA message present and less PCR cycles (<1 7) are required to produce 

30 quantifiable amoimt of PCR product. The lower the concentrations of the target mRNA present 
the more PCR cycles are required to produce a detectable amoimt of product These QC-RT- 
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PCR procedures were applied to all the target genes listed in Table S using the respective primers 
indicated therein. 



E. HPLC Quantification 
5 Upon completion of the QC-RT-PCR reactions the san^)les were analyzed and 

quantitated by HPLC. Five to 6fteen microliters of the QC-RT-PCR reaction mix was injected 
into a Waters Bio-Compatible 625 HPLC with an attached Waters 484 tunable detector. The 
detector was set to measure a wave length of 254 nm. The HPLC contained a Sarasep brand 
DNASep^ column (Sarasep, Inc., San Jose, CA) which was placed within the oven and the 

10 tempCTaturesetforS2''C. The column was installed according to the manu&cturer's 

reconunendalion of having 30 cm. of heated PEEK tubing installed between the injector and the 
column. The system was configured with a Sarasep brand Guard column positioned before the 
injector. In addition, there was a 0.22 ^m filter disk just before the column, within the oven. 
Two Buffers were used to create an elution gradient to resolve and quantitate the PCR products 

15 fiom the QC-RT-PCR reactions. Buffer-A consists of 0.1 M tri-ethyl ammonium acetate 

(TEAA) and 5% acetonitrUe (volume to volume). Buffer-B consists of 0. 1 M TEAA and 25% 
acetonitrile (volume to volume). The QC-RT-PCR samples were injected into the HPLC and the 
linear gradient of 75% buffer-A/ 25% buffer-B to 45% buffer-A/ 55% B was run over 6 min at a 
flow rate of 0.85 ml per minute. The QC-RT-PCR product of the competitor RNA being 1 8 

20 base pairs smaller is eluted from the HPLC column before the QC-RT-PCR product from the 
CYPS2A5 mRNA(U). The amount of the QC-RT-PCR products are plotted and quantitated with 
an attached Waters Corporation 745 data module. The log ratios of the amount of CYP52A5 
mRNA QC-RT-PCR product (U) to competitor QC-RT-PCR product (C), as measured by peak 
areas, was plotted and the amount of competitor RNA required to equal the amount of CYP52A5 

25 mRNA product determined. In the case of each of the target genes listed in Table 5, the 

competitor RNA contained fewer base pairs as compared to the native target mRNA and eluted 
before the iiativeinRNA in a maimer sirnilar to that demonstrated by HPLC 
quantification of the genes was conducted as above. 

30 
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EXAMy^E12 
EvalaatioD of New Straiiis in Shake Flasks 
The CYP and CPR amplified strains such as strains HDCIO, HDC15, HDC20 and 
HDC23 (Table 1) and H5343 were evaluated for diacid production m shake flasks. A smgle 
5 colony for each strain was transferred firom a YPD agar plate into 5 ml of YPD broth and grown 
overnight at 30®C, 250 rpm. An inoculum was then transferred into 50 ml of DCA2 medium 
(Chart) and grown for 24 h at SO^'C, 300 rpm. The cells were centrifuged at 5000 rpm for 5 min 
and lesuspended in 50 ml of DCA3 medium (Chart) and grovm for 24 h at 30**C, 300 rpm. 3% 
oleic acid w/v was added after 24 h growth in DCA3 medium and the cultures were allowed to 
10 bioconvert oleic acid for 48 h. Samples were harvested and the diacid and monoacid 

concentrations were analyzed as per the scheme given in Figure 35. Each strain was tested in 
diq>licate and the results shown in Table 8 represent the average value fiom two flasks. 

Table 8. Bioconversion of oleic acid by different recombinant strains of Candida tropicalis 

15 



Strain 


Conversion to 


Specific Conversion 




Oleic diacid 


(g diacid/g biomass 




(%) 




H5343 


41.9 


0.53 


HDC 10-2 


50.5 


0.85 


HDC15 


54.4 


0.85 


HDC 20-1 


45.1 


0.72 


HDC 20-2 


45.3 


0.58 


HDC 23-2 


55.2 


0.84 


HDC 23-3 


58.8 


0.89 



25 EXAMPLE 13 

Cloning and Characterization of C tropicalis 20336 Cytochrome P450 
Monooxygenase (CTP) and Cytochrome P4S0 NADPH Oxidoreductase (CPK) Genes 

To clone CYP and CPR genes several different strategies were employed. 
30 Available CYP amino acid sequences were aligned and regions of sunihirity were observed 

(Figure 4). These regions corresponded to described conserved regions seen in other cytochrome 
P4S0 families (Goeptar et al., jupra and Kalb et al. siqjra). Proteins Scorn eight eukaryotic 
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cytochrome P4S0 fiunilies share a segmented region of sequence similarity. One region 
corresponded to the HR2 domain containing the invariant cysteine residue near the carboxyl 
terminus vAnch is required for heme binding while the other region corresponded to the central 
region of the I helix thought to be involved in substrate recognition (Figure 4). Degenerate 
5 oligonucleotide primers corresponding to these highly conserved regions of the CYP52 gene 
famil y present in Candida maltosa and Candida tropicalis ATCC 750 were designed and used 
to amplify DNA fragments of CYP genes from C. tropicalis 20336 genomic DNA. These 
disoete PCR fragments were then used as probes to isolate full-length CYP genes from the C. 
tropicalis 20336 genomic libraries. In a few instances oligonucleotide primers corresponding to 
10 highly conserved regions were directly used as probes to isolate full-length CYP genes from 
genomic libraries. In the case of CPR a heterologous probe based upon the known DNA 
sequence for the CPR gene from C. tropicalis 750 was used to isolate the C tropicalis 20336 
CPR gene. 

IS A. aoning of the CPR Gene from C tropicalis 20336 

1) aoning of the CPiL4 AUele 

Approximately 25,000 phage particles from the first genomic library of C 
tropicalis 20336 were screened with a 1 .9 kb BamUl-Ndel fragment from plasmid pCU3RED 
(See Picattagio et al., Bio/Technology 10:894-898 (1992), incorporated herein by reference) 

20 containing most of the C. tropicalis 750 CPR gene. Five clones that hybridized to the probe 
were isolated and the plasmid DNA from these lambda clones was rescued and characterized by 
restriction enzyme analysis. The restriction enzyme analysis suggested that all five clones were 
identical but it was not clear that a complete CPR gene was present. 

PCR analysis was used to determine if a complete CPR gene was present in any of 

25 the five clones. Degraerate primers were prepared for highly conserved regions of known CPR 
genes (See SuttCT et al., J. Biol. Chem. 265:16428-16436 (1990), incorporated herein by 
refoence) ( Figure 4). Two Primers were synthesized for the FMN bmding region (FMNl , SEQ 
ID NO: 16 and FMN2, SEQ ID NO: 17). One primer was synthesized for the FAD bmding 
region (FAD, SEQ ID NO: 18), and one primer for the NADPH binding region (NADPH, SEQ 

30 ID NO: 19) (Table 4). These four primers were used in PCR amplification experiments using as 
a template plasmid DNA isolated firom four of the five clones described above. The FMN (SEQ 
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ID NOS: 16 and 17) and FAD (SEQ ID NO: 18) primos served as forward primers and the 
NADPH primer (SEQ ID NO: 19) as the reverse primer in the PCR reactions. When different 
combinations of forward and reverse primers were used, no PCR products were obtained firom 
any of the plasmids. However, all primer combinations amplified expected size products with a 
5 plasmid containing the C tropicalis 750 CPR gene (positive control). The most likely reason for 
the failure of the primer pairs to amplify a product, was that all four of clones contained a 
truncated CPJtgene. One of the four clones (pHKMl) was sequenced using the Triplex 5' 
(SEQ ID NO: 30) and the Triplex 3* (SEQ ID NO: 31) primers (Table 4) which flank the insert 
and the multiple cloning site on the cloning vector, and with the degenerate primer based yxpon 

10 the NADPH binding site described above. The NADPH primer (SEQ ID NO: 19) foiled to yield 
any sequence data and this is consistent widi the PCR analysis. Sequences obtained with Triplex 
primers were compared with C tropicalis 750 CPR sequence using the MacVector™ program 
(Oxford Molecular Group, Campbell, CA). Sequence obtained with the Triplex 3' primer (SEQ 
ED NO: 31) showed similarity to an internal sequence of the C tropicalis 750 CPR gene 

15 confirming that pHKMl contained a truncated version of a 20336 CPR gene. pHKMl had a 3.8 
kb insert whidi mcluded a 1 2 kb coding region of the CPR gene accompanied by 2.5 kb of 
upstream DNA (Figure 5). Approximately 0.85 kb of the 20336 CPR gene encoding the C- 
terminal portion of the CPR protein is missing firom this clone. 

Since the first Clontech library yielded only a truncated CPR gene, the second 

20 library prepared by Clontech was screened to isolate a fiill-length CPR gene. Three putative 
CPJ? clones were obtained. The three clones, having mserts in the range of 5-7 kb, were 
designated pHKM2,pHKM3 and pHKM4. All three were characterized by PCR using the 
degenerate primers described above. Both pHKM2 and pHKM4 gave PCR products with two 
sets of mtemal primers. pHKM3 gave a PCR product only with the FAD (SEQ ID NO: 18) and 

25 NADPH (SEQ ID NO: 1 9) primers suggesting that this clone likely contained a truncated CPR 
gene. All three plasmids were partially sequenced using the two Triplex primers and a third 
primer whose sequence was selected fiom the DNA sequence near the truncated end of the CPR 
gene present in pHKMl . This analysis confirmed that both pHKM2 & 4 have sequences that 
ovCTlap pHKMl and that both contained the 3' region of CP/? gene that is missing fi-om 

30 pHKMl . Portions of mserts firom pHKMl and pHKM4 were sequenced and a fiill-length CPR 
gene was identified. Based on the DNA sequence and PCR analysis, it was concluded that 
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pHKMl contained the putative promoter region and 1 .2 kb of sequence encoding a portion (S' 
end) of a CPR gene. pHKM4 had 1 .1 kb of DNA that overlapped pHKM 1 and contained the 
remainder (3* end) of a CPR gene along with a downstream untranslated region (Figure 6). 
Together these two plasmids contained a complete CPRA gene with an upstream promoter 

5 region. CPRA is 4206 nucleotides in length (SEQ ID NO: 8 1 ) and includes a regulatory region 
and a protein coding region (defined by nucleotides 1006-3042) which is 2037 base pairs in 
length and codes for a putative protein of 679 amino acids (SEQ ID NO: 83) (Figures 13 and 
14). In Figure 13, the asterisks denote conserved nucleotides between CPRA and CPRB, bold 
denotes protein coding nucleotides, and the start and stop codons are underlined. The CPRA 

10 protein, \^en analyzed by the protein alignment program of the CrmeWorics™ software package 
(Oxford Molecular Group, Campbell, CA), showed extensive homology to CPR proteins from C 
tropicalis 7S0 and C. maltosa. 



was screened usmg DNA fragments from pHKMl and pHKM4 as probes. Five clones were 
obtained and these were sequenced with the three internal primers used to sequence CPRA. 
These primere were designated PRK1.F3 (SEQ ID NO: 20) , PRK1.F5 (SEQ ID NO: 21) and 
PRK4.R20 (SEQ ID NO: 22) (Table 4). and tiie two outside primers (M13 -20 and T3 

20 [Stratagene]) for the polylinker region present in the pBK-CMV cloning vector. Sequence 
analysis suggested that four of these clones, designated pHKMS to 8, contained inserts which 
were identical to the CPRA allele isolated earlier. All four seemed to contain a full length CPR 
gene. The fifth clone was very similar to the CPRA allele, especially in the open reading fiame 
region where the identity was very high. However, there were significant differences in die S' 

25 and 3' untranslated regions. This suggested that the fifth clone was the allele to CPRA. The 
plasmid was designated pHKM9 (Figure 7) and a 4. 14 kb region of this plasmid was sequenced 
and the analysis of this sequence confirmed the presence of die CPRB allele (SEQ ID NO: 82), 
which includes a regulatory region and a protein coding region (defined by nucleotides 1033- 
3069) (Figure 13). The amino acid sequence of the CPRB protein is set forth in SEQ ID NO: 84 

30 (Figure 14). 



15 



2) 



Cloning of the CPRB Allele 

To clone the second CPRB allele, the third genomic library, prepared by Henkel, 
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B. Cloning of C tropicaUs 20336 (CYF) Genes 
1) Cloning of CrP52A2A, CYP52A3A & 3B and CTPSZASA & SB 
Clones carrying CYPSIATA^ ASA, A3B, ASA and ASB genes were 
isolated firom the first and second Clontecb genomic libraries using an oligonucleotide probe 
5 (HoneB 1 , SEQ ID NO: 27) \s^ose sequence was based upon the amino acid sequence for the 
highly conserved heme binding region present throughout the CYP52 family. The first and 
second libraries were converted to the plasmid form and screened by colony hybridizations 
using tfie HemeBl probe (SEQ ID NO: 27) (Table 4). Several potential clones were isolated and 
the plasmid DNA was isolated fix>m these clones and sequenced using the HemeB I 

10 oligonucleotide (SEQ ID NO: 27) as a primer. This approach succeeded in identifying five 

CYPS2 genes. Three of the CYP genes appeared unique, while the remaimng two were classified 
as alleles. Based upon an arbitrary choice of homology to CYPS2 genes firom Candida maltosOj 
these five genes and corresponding plasmids were designated CYPS2A2A (pPAlS [Figure 26]), 
CYPS2A3A (pPA57 [Figure 29]), CYP52A3B (pPA62 [Figure 30]), CYPS2ASA (pPAL3 [Figure 

15 3 1]) and CYPS2ASB (pPA5 [Figure 32]), TTie complete DNA sequence includmg regulatory and 
protein coding regions of diese five genes was obtamed and confirmed tfiat all five were CYPS2 
genes (Figure 1 5). In Figure 1 5, the asterisks denote conserved nucleotides among the CYP 
genes. Bold indicates the protein coding nucleotides of the CYP genes, and the start and stop 
codons are underlined. The CYP52A2A gene as represented by SEQ ID NO: 86 has a protein 

20 coding region defined by nucleotides 1 1 99-2767 and the encoded protein has an amino acid 
sequence as set forth in SEQ ID NO: 96. The CYP52A3A gene as represented by SEQ ID NO: 
88 has a protein encoding region defined by nucleotides 1 126*2748 and the encoded protein has 
an amino acid sequence as set forth in SEQ ID NO: 98« The CYPS2A3B gene as represented by 
SEQ ID NO: 89 has a protein coding defined by nucleotides 913-2535 and the encoded protein 

25 has an amino acid sequence as set forth in SEQ ID NO: 99. The 01752/45^ gene as represented 
by SEQ ID NO: 90 has a protein coding region defined by nucleotides 11 03-2656 and the 
encoded protein has an amino acid sequence as set forth in SEQ ID NO: 100. The CYPS2ASB 
gene as represented by SEQ ID NO: 91 has a protein coding region defined by nucleotides 1 142- 
2695 and the encoded protein has an ammo acid sequence as set forth m SEQ ID NO: 101 . 

30 
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2) aoningof CYP52A1A 2nd CYPS2A8A 

CYP52A1A and CYP52A8A genes were isolated from the third genomic library 
using PGR fiagmmts as probes. The PGR fragment probe for CYP52AJ was generated after 
PGR amplification of 20336 genomic DNA with oligonucleotide primers that were designed to 
5 amplify a region from the Helix I region to the HR2 region using all available C1T52 genes from 
National Center for Biotechnology Information. Degenerate forward primers UCupl (SEQ ID 
NO: 23) and UCiq)2 (SEQ ID NO: 24) were designed based upon an amino acid sequence (- 
RDTTAG-) from the Helix I region (Table 4). Degenerate primers UGdownl (SEQ ID NO: 25) 
and UGdown2 (SEQ ID NO: 26) were designed based upon an amino acid sequence (-GQQFAL- 

10 ) from the HR2 region (Table 4). For the reverse primers, the DNA sequence represents the 
reverse complement ofthe corresponding amino acid sequence. These primers were used in 
pairwise combinations m a PGR reaction with Stofiel Tag DNA polymerase (Perkin-Elmer 
Getus, Foster GiQr,GA) according to the manufacturer's recommended procedure. A PGR 
product ofapproxunately 450 bp was obtained. This product was purified from agarose gel 

15 usmg Gene-clean™ (Bio 101, LaJoUa, GA) and Ugated to the pTAG™ vector (Figure 17) (R&D 
systems, Nfinneapolis, MN) according to the recommendations of the manufricturer. No 
treatment was necessary to clone into pTAG because it employs the use ofthe TA cloning 
technique. Plasmids from several transformants were isolated and their inserts were 
characterized. One plasmid contained the PGR clone intact The DNA sequence ofthe PGR 

20 fragment (designated 44CIT3, SEQ ID NO: 107) shared homology with the DNA sequences for 
the CYP52A1 gene of C maltosa and the CYP52A3 gene of C tropiccdis 750. This fragment 
was used as a probe in isolating the C. tropicalis 20336 CYP52A1 homolog. The third genomic 
library was screened using the 44CKP3 PGR probe (SEQ ID NO: 1 07) and a clone (pHKMl 1) 
that contained a fiill-length CYPS2 gene was obtained (Figure 8). The clone contained a gene 

25 having regulatory and protein coding regions. An open reading frame of 1572 nucleotides 
encoded a CYP52 protein of 523 amino acids (Figures 15 and 16 ). This CYPS2 gene was 
designated CYP52A1A (SEQ ID NO: 85) since its putative amino acid sequence (SEQ ID NO: 
95) was most similar to the CYP52A1 protein of C. maltosa. The protein coding region ofthe 
CYP52A1A gene is defined by nucleotides 1 177-2748 of SEQ ID NO: 85. 

30 A similar approach was taken to clone CYP52A8A. A PGR fragment probe for 

CYP52A8 was generated using primers for highly conserved sequences of CYP52A3^ CYP52A2 
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and CYP52A5 genes of C tropicalis 750. The reverse primer (primer 2^3,5 JM) (SEQ ID NO: 29) 
was designed based on the highly conserved heme binding region (Table 4). The design of tiie 
forward primer (primer l^JS^) (SEQ ID NO: 28) was based upon a sequence conserved near 
the N-terminus of the CYP52A3, CYP52A2 and CYP52A5 genes from C tropicalis 750 (Table 

5 4). Amplification of 20336 genomic DNA with these two primers gave a mixed PGR product 
One amplified PGR fi:agment was 1 006 bp long (designated DCAl 002). The DNA sequence for 
this fragment was detennined and was found to have 85% identity to the DNA sequence for the 
CYP52D4 gene of C tropicalis 750. When this PGR product was used to screen the third 
genomic library one clone (pHKM12) was identified that contained a full-length C1T52 gene 

10 along with 5' and 3' flanking sequences (Figure 9). The C1T52 gene included regulatory and 
protein coding regions with an open reading frame of 1 539 nucleotides long which encoded a 
putative GYP52 protein of 512 amino acids (Figures 15 and 16 ). This gene was designated as 
CYP52A8A (SEQ ID NO: 92) since its amino acid sequence (SEQ ID NO: 102) was most 
similar to the CYP52A8 protein of C. maltosa. The protein coding region of the CYP52A8A gene 

15 is defined by nucleotides 464-2002 of SEQ ID NO: 92. The amino acid sequence of the 
CYP52A8A protein is set forth m SEQ ID NO: 102. 

3) Cloning of CYP52D4A 

The screening of the second genomic library with the HemeB 1 (SEQ ID NO: 27) 

20 primer (Table 4) yielded a clone carrying a plasmid (pPAl 8) that contained a truncated gene 
having homology with the CYP52D4 gene of C maltosa (Figure 33). A 1.3 to 1.5-kb £coRI- 
Sstl firagment from pPAl 8 containing part of the truncated CYP gene was isolated and used as a 
probe to screen the third genomic library for a frill length CYP52 gene. One clone (pHKM13) 
was isolated and found to contain a fiill-length CYP gene with extensive 5* and 3' flanking 

25 sequences (Figure 10). This gene has been designated as C}T52Z)¥i4 (SEQ ID NO: 94) and the 
complete DNA including regulatory and protein codir^ regions (coding region defined by 
nucleotides 767-2266) and putative ammo acid sequence (SEQ ID NO: 104) of this gene is 
shown in Figures 15 and 16. CYP52D4A (SEQ ID NO: 94) shares the greatest homology with 
the CYP52D4 gene of C maltosa. 

30 
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4) Cloning of CYP52A2B and CYP52A8B 

A mixed probe containing CYP5U1A, A2A, A3A, D4A,A5A znAA8A genes was 
used to screen the third genomic library and several putative positive clones were identified. 
Seven of these were sequenced with the degenerate primers Cyp52a (SEQ ED NO: 32), Cyp52b 

5 (SEQ ID NO: 33), Cyp52c (SEQ ID NO: 34) and Cyp52d (SEQ ID NO: 35) shown in Table 4. 
These primers were designed from highly conserved regions of the four CYP52 subfamilies, 
namely CYP52A, B,C&D, Sequences from two clones, pHKMM and pHKMlS (Figures 1 1 and 
12), shared considerable homology with DNA sequence of the C iropicalis 20336 CYP52A2 
and CYP52A8 genes, respectively. The complete DNA (SEQ ID NO: 87) includmg regulatory 

10 and protein codmg regions (coding region defined by nucleotides 1072-2640) and putative amino 
acid sequence (SEQ ID NO: 97) of the CYP52 gene present in pHKM14 suggested that it is 
CYP52A2B (Figures 15 and 16), The complete DNA (SEQ ID NO: 93) including regulatoiy and 
protein coding regions (coding region defined by nucleotides 1017-2555) and putative amino 
acid sequence (SEQ ID NO: 103) of the CYP52 gene present in pHKM15 suggested that it is 

15 CYP52A8B (Figures 15 and 16). 

EXAMPLE 14 

Identification of CYP and CPR Genes Induced by 
Selected Fatty Acid and Alkane Substrates 

20 

Graes whose transcription is turned on by the presence of selected fatty 
acid or alkane substrates have been identified using the QC-RT-PCR assay. This assay was used 
to measure (CYP) and (CPR) gene e}q)ression in fermentor grown cultures C tropicalis ATCC 
20962. This method involves the isolation of total cellular RNA fiom cultures of C tropicalis 

25 and the quantification of a specific mRNA withm that sample through the design and use of 
sequence specific QC-RT-PCR primers and an RNA competitor. Quantification is achieved 
through the use of known concentrations of highly homologous competitor RNA in the QC-RT- 
PCR reactions. The resultmg QC-RT-PCR amplified cDNA's are separated and quantitated 
through the use of ion pairing reverse phase HPLC. This assay was used to characterize the 

30 expression of CYP52 genes of C tropicalis ATCC 20962 in response to various firtty acid and 
alkane substrates. Genes which were induced were identified by the calculation of their mRNA 
concentration at various times before and after induction. Figure 1 8 provides an example of 
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how the concentration of mRNA for CYP52A5 can be calculated using the QC-RT-PCR assay. 
The log ratio of unknown (U) to competitor product (C) is plotted versus the concentration of 
competitor RNA present in the QC-RT-PCR reactions. The concentration of competitor which 
results in a log ratio of U/C of zero, represents the point where the unknown messenger RNA 

5 concentration is equal to the concentration of the competitor. Figure 1 8 allows for the 

calculation of the amount of CYP52A5 message present in 100 ng of total RNA isolated from 
cell samples taken at 0, 1 , and 2 hours after the addition of Emersol® 267 in a fermentor run. 
From this analysis, it is possible to determine the concentration of the CYP52A5 mRNA present 
in 100 ng of total cellular RNA. In the plot contained in Figure 1 8 it takes 0.46 pg of competitor 

10 to equal the number of mRNA's of CYP52A5 in 100 ng of RNA isolated from cells just prior 
(time 0) to the addition of the substrate, Emersol® 267. In cell samples taken at one and two 
hours after the addition of Emersol® 267 it takes S.S and 8.5 pg of competitor RNA« 
respectively. This result demonstrates that CYP52A5 (SEQ ID NOS: 90 and 91) is induced more 
dian 18 fold within two hours after the addition of Emersol® 267. This type of analysis was 

IS used to demonstrate that CYP52A5 (SEQ ID NO: 90 and 91) is induced by Emersol® 267. 
Figure 19 shows the relative amounts of CYP52A5 (SEQ ID NOS: 90 and 91) expression in 
fermentor runs with and without Emersol® 267 as a substrate. The dififerences in the CYP52AS 
(SEQ. ID NOS: 90 and 91) expression patterns are due to the addition of Emersol® 267 to the 
fermentation medium. 

20 This analysis clearly demonstrates that expression of CYP52A5 (SEQ ID NOS: 90 

and 91) in C tropicalis 20962 is inducible by the addition of Emersol® 267 to the growth 
medium. This analysis was performed tp characterize the expression of CYP52A2A (SEQ ID 
NO: 86) , CYP52A3AB (SEQ ID NOS: 88 and 89) , CYP52A8A (SEQ ID NO: 92) . CYP52A1A 
(SEQ ID NO: 85), CYP52D4A (SEQ ID NO: 94) and CPRB (SEQ ID NO: 82) in response to the 

25 presence of Emersol® 267 in the fermentation mediuih (Figure 20). The results of diese 

analysis' indicate, that like the CYPS2A5 gene (SEQ ID NOS: 90 and 91) of C. tropicalis 20962, 
the CYP52A2A gene (SEQ ID NO: 86) is inducible by Emersol® 267. A small induction is 
observed for CYPS2A1A (SEQ ID NO: 85) and CYPSUSA (SEQ ID NO: 92). In contrast, any 
induction for CYP52D4A (SEQ ID NO: 94), CYP52A3A (SEQ ID NO: 88), CYP52A3B (SEQ ID 

30 NO: 89) is below the level of detection of the assay. CPRB (SEQ ID NO: 82) is moderately 
induced by Emersol® 267, four to five fold. The results ofthese analysis are summarized in 
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Figure 20. Figure 34 provides an example of selective induction of CYP52A genes. When pure 
fatty add or allcaqes are spiked into a fermentor containing C. tropicalis 20962 or a derivative 
fliercof, the transcriptional activation of CYP52A genes was detected using the QC-RT-PCR 
assay. Figure 34 shows that pure oleic acid (CI 8:1) strongly induces CYP52A2A (SEQ ID NO: 
5 86) while inducing CYP52A5 (SEQ ID NOS: 90 and 91). In die same fermentor addition of pure 
alkane (tridecane) shows strong induction of both CYP52A2A (SEQ ID NO: 86) and CYP52A1A 
(SEQ ID NO: 85). However, tridecane did not induce CKPJ2i4 J (SEQ ID NOS: 90 and 91). In 
a separate fermentation using ATCC 20962, containing pure octadecane as the substrate, 
induction of CKPJ2424, CYP52A5A and CYP5U1A is detected (see Figure 36). The foregoing 
10 demonstrates selective induction of particular CYP genes by specific substrates, thus providing 
techniques for selective metabolic engineering of cell strains. For example, if tridecane 
modification is desired, organisms engineered for high levels of CYP52A2A (SEQ ID NO: 86) 
and CYP52A1A (SEQ ID NO: 85) activity arc indicated. If oleic acid modification is desired, 
organisms engineered for high levels of CYP52A2A (SEQ ID NO: 86) activity are indicated. 

15 

EXAMPLE 15 

Integration of Selected CYP and CPR Genes 
into the Genome of Candida tropicalis 

20 In order to integrate selected genes into the chromosome of C tropicalis 20336 or 

its descendants, there has to be a target DNA sequence, which may or may not be an intact gene, 
into which the genes can be inserted. There must also be a method to select for the integration 
event In some cases the target DNA sequence and the selectable maricer are the same and, if so, 
then there must also be a method to regain use of the target gene as a selectable maricer followuig 

25 the integration event In C. tropicalis and its descendants, one gene which fits these criteria is 
URA3A^ encoding orotidine-S -phosphate decarboxylase. Using it as a target for integration, urcT 
variants of C. tropicalis can be transformed in such a way as to regenerate a URA* goiotype via 
homologous recombination (Figure 21). Depending upon the design of die integration vector, 
one or more genes can be integrated mto the genome at the same time. Using a split URA3A 

30 gene oriented as shown in Figure 22, homologous integration would yield at least one copy of the 
gene(s) of interest v^ch are inserted between die split portions of the URA3A gene. Moreover, 
because of the high sequence similarity between URA3A and URA3B genes, integration of the 
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constnict can occur at both the URA3A and URA3B loci. Subsequently, an oligonucleotide 
designed with a deletion in a portion of the URA gene based on the identical sequence across 
both the URA3A and URA3B genes, can be utilized to yield C tropicalis transfoimants \^ch 
are once again yr(f but which still cany one or more newly integrated genes of choice (Figure 

5 21). urcr variants of C tropicalis can also be isolated via other methods such as classical 
mut^enesis or by spontaneous mutation. Using well established protocols, selection of nrcc 
strains can be fecilitated by the use of 5-fluoroorotic acid (5-FOA) as described, e.g., in Boeke et 
al., Mol Gen. GeneL 197:345-346, (1984), incorporated herein by reference. The utility of this 
approach for the manipulation of C tropicalis has been well docimiented as described, e.g., in 

10 Picataggio et al„ Mol and Cell Biol 1 1 :4333-4339 (1991); Rohrer et al., Appl Microbiol 

BiotechnoL 36:650-654 (1992); Picataggio et al., Bio/Technology 10:894-898 (1992); U.S. Patent 
No. 5,648,247; U.S. Patent No. 5,620,878; U.S. Patent No. 5,204,252; U.S. Patent No. 
5,254,466, all of vMcYi are incorporated herein by reference. 

15 A Construction of a URA Integration Veetor, pURAin. 

Primers were designed and syntfiesized based on the 1 712 bp sequence of the 
URA3A gene of C. tropicalis 20336 (see Figure 23). The nucleotide sequence of the URA3A 
gene of C. tropicalis 20336 is set forth in SEQ ID NO: 105 and the amino acid sequence of the 
encoded protein is set forth in SEQ ID NO: 106. URA3A Primer Set #la (SEQ ID NO: 9) and 

20 #lb (SEQ ID NO: 10) (Table 4) was used in PGR with C tropicalis 20336 genomic DNA to 
amplify URA3A sequences between nucleotide 733 and 1688 as shown in Figure 23. The 
primers are designed to introduce unique 5' Asc\ and 3* Pad restriction sites into the resulting 
amplified URA3A firagment Ascl and Pad sites were chosen because these sites are not present 
within CZP or CPii genes identified to date. f/iLUii Primer Set #2 was used in PGR with C 

25 tropicalis 20336 genomic DNA as a tenq)late, to amplify URA3A sequmces between nucleotide 
9 and 758 as shown in Figure 23. C/Ki45i4 Primer set #2a (SEQ ID NO: ll)and#2b(SEQID 
NO: 12) (Table 4) was designed to introduce unique 5' Pad and 3' Pmel restriction sites into the 
resulting amplified £/i2^5il fragment The Pmel site is also not present within CI7 and CPi? 
genes identified to date. PGR fragments of the URA3A gene were purified, restricted with Ascl, 

30 Pad and Pmel restriction enzymes and ligated to a gel purified, QiaexII cleaned Asd-Pmel 
digest of plasmid pNEB193 (Figure 25) purchased torn New England Biolabs (Beverly, MA). 
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Hie ligation was perfoimed wiA an equimolar number of DNA tennini at 16 "^C for 16 hr using 
T4 DNA ligase (New England Biolabs). Ligations were transformed into £ coli XLl-Blue cells 
(Stratagene, LaJolla, CA) according to manufacturers reconmiendations. White colonies were 
isolated, grown, plasmid DNA isolated and digested widi AscUPmel to confirm insertion of the 
5 modified URA3A into pNEB 193. The resulting base integration vector was named pURAin 
(Figure 24). 

B. Amplification of CYP52A2A, CYP52A3A, CYP52A5A and 
CPRB from C iropicalis 20336 Genomic DNA 

10 The genes encoding CYP52A2A, (SEQ ID NO: 86) and CYP52A3A (SEQ ID NO: 

88) fixjm C. tropicalis 20336 were amplified fix)m genomic clones (pPA15 and pPA57, 
respectively) (Figures 26 and 29) via PGR using primers (Primer CYP 2A#1, SEQ ID NO: 1 and 
Primer CYP SEQ ID NO: 2 for CYP52A2A) (Primer CYP 3A#1, SEQ ID NO: 3 and 
Primer CYP 3M2, SEQ ID NO: 4 for CYP52A3A) to introduce Pad cloning sites. These PGR 

IS primers were designed based i^on the DNA sequence determined for CYPS2A2A (SEQ ID NO: 
86) (Figure IS). The AmpUro? Gold PGR kit (Perkin Ebner Getus, Foster Gity, GA) was used 
according to manu&cturers specifications. The CYP52A2A PGR amplification product was 2^0 
base pairs in length , yielding 496 bp of DNA upstream of the CYPS2A2A start codon and 168 bp 
downstream of the stop codon for the CYP52A2A ORF. The CYP52A3A PGR amplification 

20 product was 21S4 base pairs in lengdi, yielding 437bp of DNA iqistream of the CYP52A3A start 
codon and 97bp downstream of the stop codon for the CYP52A3A ORF. The CYP52A3A PGR 
amplification product was 21 S4 base paurs in leiigth, yielding 437bp of DNA upstream of the 
CYP52A3A start codon and 97bp downsteam of the stop codon for the CYP52A3A ORF. 

The gene encoding CYP52A5A (SEQ ID NO: 90) fiom C. tropicalis 20326 was 

25 amplified fiom genomic DNA via PGR using primers (Primer CYP 5A#1 , SEQ ID NO: 5 and 
Primer CYP 5M2, SEQ ID NO: 6) to introduce Pad cloning sites. These PGR primers were 
designed based upon the DNA sequence determined for CYP52A5A (SEQ ID NO: 90) . The 
Expand Hi-Fi Tag PGR kit (Boehringer Mannheim, Indianapolis, IN) was used according to 
manufiu:turers specifications. The CYP52A5A PGR amplification product was 3,298 base pairs 

30 in length. 
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The gene encoding CPRB (SEQ ID NO: 82) from C tropicalis 20336 was 
amplified from genomic DNA via PCR using primers {CPR B#l. SEQ ID NO: 7 and CPR B#2, 
SEQ ID NO: 8) based upon the DNA sequence determmed for CPRB (SEQ ID NO: 82) (Figure 
13). These primers were designed to introduce unique Pad cloning sites. The Expand Hi-Fi 
5 Tag PCR kit (Boehringer Mani^eim, Indian^lis, IN) was used according to manu f acturers 
specifications. The CPRB PCR product was 3266 bp in length, yielding 747 bp pf DNA 
upstream of the CPRB start codon and 493 bp downstream of the stop codon for the CPRB ORF. 
The resulting PCR products were isolated via agarose gel electrophoresis, purified using QiaexII 
and digested with Pad. The PCR fif^^ments were purified, desalted and concentrated using a 
10 Microcon 1 00 (Amicon, Beverly, MA). 

The above described amplification procedures are applicable to the other genes 
listed in Table S using the respectively indicated primers. 

C Cloning of CYP and CPR Genes into pURAin. 

15 The next step was to clone the selected CYP and CPR genes into the pURAin 

integration vector. In a preferred aspect of the present mvention, no foreign DNA other than that 
specifically provided by synthetic restriction site sequences are incorporated into the DNA which 
was cloned into the genome of C tropicalis, i.e., with the exception of restriction site DNA only 
native C. tropicalis DNA sequences are incorporated into the genome. pURAin was digested 

20 with Pad, Qiaex 11 cleaned, and dephosphorylated with Shrimp Alkalme Phosphatase (SAP) 
(United States Biochemical, Cleveland, OH) according tiie manufacturer's recommendations. 
Approximately 500 ng of Pad linearizedpURAin was dephosphorylated for 1 hr at 37*^0 using 
SAP at a concentration of 0.2 Units of enzyme per 1 pmol of DNA termini. The reaction was 
stopped by heat inactivation at 65 °C for 20 min. 

25 The CYP52A2A Pad fiagment derived usmg the primer shown in Table 4 was 

ligated to plasmid pURAin which had also been digested with Pad. Pad digested pURAin was 
dephosphorylated, and ligated to tiie CYP52A2A ULTMA PCR product as described previously. 
The ligation mixture was transformed mto £. coli XLl Blue MRP' (Stratagene) and 2 resistant 
colonies were selected and screened for correct constructs which should contain vector sequmce, 

30 the inverted URA3A gene, and tiie amplified CYP52A2A gene (SEQ ID NO: 86) of 20336. Ascl- 
Pmel digestion identified one of the two constructs, plasmid pURA2in, as being correct (Figure 
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27). This plasmid was sequenced and compared to CYP52A2A (SEQ ID NO: 86) to confinn thai 
PCR did not introduce DNA base changes that would result in an amino acid change. 

Prior to its use, the CPRB Pad fiagment derived using the primers shown in 
Table 4 was sequenced and compared to CPRB (SEQ ID NO: 82) to confinn that PCR did not 

5 introduce DNA base pair changes that would result in an amino acid change. Following 
confirmation, CPRB (SEQ ID NO: 82) was ligated to plasmid pURAin which had also been 
digested with Pad. Pad digested pURAin was dephosphorylated, and ligated to the CPR 
E3q)and Hi-Fi PCR product as described previously. The ligation mixture was transformed into 
Ei coli XLl Blue MRF' (Stratagene) and several resistant colonies were selected and screened 

10 for correct constructs \^ch should contain vector sequence, the inverted URA3A gene, and the 
amplified CPRB gene (SEQ ID NO: 82) of 20336. Asd-Pmel digestion confirmed a successfiil 
construct, pURAREDBin. 

In a manner similar to the above, each of the other CYP and CPR genes disclosed 
herein are cloned into pURAm. Pad fiagments of these genes, whose sequences are given in 

15 Figures 13 and 15, are derivable by methods known to those skilled in the art 

1) Construction of Vectors Used to Generate HDC 20 and HDC 23 

A previously constructed integration vector containing CPRB (SEQ ED NO: 82), 
pURAREDBin, was chosen as the starting vector This vector was partially digested with Pad 

20 and the linearized fiagment was gel-isolated. TTie active Pad was destroyed by treatment with 
T4 DNA polymerase and the vector was re-ligated. Subsequent isolation and complete digestion 
of this new plasmid yielded a vector now containing only one active Pad site. This firagment 
was gel-isolated, dephosphorylated and ligated to the CYP52A2A Pad fiagment. Vectors that 
contain the CYPSU2A (SEQ ID NO: 86) and CPRB (SEQ ID NO: 82) genes oriented in the 

25 same direction, pURAin CPR 2A S, as well as opposite directions (5' ends connected), pURAin 
CPR 2A O, WCTC generated. 

D. Confirmation of CYP Integration (Figure 21 for Integration Scheme) 
into the Genome of C tropicalis 

30 Based on the construct, pURA2in, iised to transform H5343 ura^ a scheme to 

detect integmtion was devised. Genomic DNA fiom transfonnants was digested with Dra m 
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and Spe I which are enzymes that cut within the URA3A, and URA3B genes but not within the 
mtegrated CYP52A2A gene. Digestion of genomic DNA M^ere an integration had occurred at 
the URA3A or URA3B loci would be eiq^cted to result in a 3.5 kb or a 33 kb fragment, 
respectively (Figure 28). Moreover, digestion of the same genomic DNA wifli Pad would yield 
5 a 2.2 kb ftagment characteristic for tiie integrated CYP52A2A gene (Figure 28). Somhem 
hybridizations of these digests with fragments of the CYP52A2A gene were used to screen for 
these integration events. Intensity of the band signal from the Southern using Pad digestion was 
usjsd as a measure of the nimiber of integration events, ((i.e. the more copies of the CYP52A2A 
gene (SEQ ID NO: 86) vAAch are present, the stronger the hybridization signal)). 

10 C. tropicalis H5343 transformed URA prototrophs were grown at 30°C, 170 rpm, 

in 10 ml SC-uracil media for preparation of genomic DNA. Genomic DNA was isolated by the 
method described previously. Genomic DNA was digested with and DralU. A 0.95% 
agarose gel was used to prepare a Southern hybridization blot. The DNA fix)m the gel was 
transfenned to a MagnaCharge nylon filter membrane (MSI Technologies, Wesiboro, MA) 

15 according to the alkaline transfer method of Sambrook et ai., stq>ra. For tilie Southern 

hybridization, a 2.2 kb CYP52A2A DNA fragment was used as a hybridization probe. 300 ng of 
CYP52A2A DNA was labeled using a ECL Dkect labeling and detection system (Amersham) and 
the Southern was processed according to the ECL kit specifications. The blot was processed in a 
volume of 30 ml of hybridization fluid corresponding to 0.125 ml/cm^ Following a 

20 prehybridization at 42°C for 1 hr, 300 ng of CYP52A2A probe was added and the hybridization 
continued for 16 hr at 42''C. Following hybridization, the blots were washed two times for 20 
mineachat42 Xin priniary wash contaiiiing urea. Two 5 niin secondary washes at RT were 
conducted, followed by detection according to directions. The blots were exposed for 16 hours 
(hr) as recommended. 

25 Integration was confirmed by the detection of a Spel'DraVH 3.5 kb fragment fipom 

the gmomic DNA of the transformants but not with the C. tropicalis 20336 control. 
Subsequently, a Pad digestion of the genomic DNA of the positive transformants, followed by a 
Southern hybridization using an CYP52A2A gene probe, confirmed integration by the detection 
of a 22 kb fragmoit The resulting CYP52A2A integrated strain was named HDCl (see Table 1). 
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In a manner similar to the above, each of the genes contained in the Pad 
fragments which are described in Section 3c above were confirmed for integration into the 
genome of C. tropicalis. 

Transfoimants generated by transformation with the vectors, pURAin CPR 2A S 
5 or pURAin CPR 2A O, were analyzed by Southern hybridization for integration of both the 
CYP52A2A (SEQ ID NO: 86) and CPRB (SEQ ID NO: 82) genes tandemly. Three strains were 
generated in which the CYP52A2A (SEQ ID NO: 86) and CPRB (SEQ ID NO: 82) genes 
integrated are in the opposite orientation (HDC 20-1, HDC 20-2 and HDC 20-3) and three were 
generated with the CYP52A2A (SEQ ID NO: 86) and CPRB (SEQ ID NO: 82) genes integrated 
10 in the same orientation (HDC 23-1, HDC 23-2 and HDC 23-3), Table 1. 

E» Confirmation of CPRB Integration into H5343 unf 

Seven transformants were screened by colony PCR using CPRB primer #2 (SEQ 
ID NO: 8) and a URASA- specific primer. In five of the transfonnants, successfiil integration 
IS was detected by the presence of a 3899 bp PCR product This 3899 bp PCR product represents 
the CPRB gene adjacent to the URA3A gene in the genome of H5343 thereby confirming 
integration. The resulting CPRB integrated strains were named HDClO-1 and HDClO-2 (see 
Table 1). 

20 F. Strain Evaluation. 

As determined by quantitative PCR, when compared to parent H5343, HDC 10-1 
contained three additional copies of the reductase gene and HDC 10-2 contained four additional 
copies of the reductase gene. Evaluations of HDC20-1, HDC20-2 and HDC20-3 based on 
Southern hybridization data indicates that HDC20-1 contained multiple integrations, i.e., 2 to 3 

25 times tiiat of HDC2a-2 or HDC20-3. Evaluations of HDC23-1, HDC23-2, and HDC23-3 based 
on Southern hybridization data indicates that HDC^3-3 contained multiple integrations, i.e., 2 to 
3 times that of HDC23-1 or HDC23-2. The data in Table 8 indicates that the integration of 
components of tiie Ci>-hydroxylase complex have a positive effect on the improvement of 
Candida tropicalis ATCC 20962 as a biocatalyst. The results indicate that CYPS2ASA (SEQ ID 

30 NO: 90) is an important gene for the conversion of oleic acid to diacid. Surprisingly, tandem 
integrations of CYP and CPR genes oriented in the opposite direction (HDC 20 strams) seem to 
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be less productive than tandem integrations oriented in the same direction (HDC 23 strains). 
Tables 1 and 8. 
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*See Kaiser et al. Methods in Yeast Genetics, Cold Spring Harbor Laboratory Press, USA (1994X incorporated herein by 
reference. 
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It will be understood that various modifications may be made to Ae embodimrats 
and/or examples disclosed hereia Thus, the above description should not be constiued as 
limiting, but merely as exemplifications of preferred embodiments. Those skilled in the art wiU 
envision oAcr modifications within the scope and spirit of the claims appended hereto. 
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WHAT IS CLAIMED yS: 

1 . Isolated nucleic acid encoding a CPRA protein having the amino 
acid sequence set forth in SEQ ID NO: 83. 

5 

2. Isolated nucleic acid comprising a coding region defined by nucleotides 1006- 
3042 as set forth in SEQ ID NO: 81. 

3. Isolated nucleic acid according to claim 2 comprising the nucleotide sequence 
10 as set forth in SEQ ID NO: 81. 

4. Isolated protein comprising an amino acid sequence as set forth in SEQ ID NO: 

83. 

IS S. A vector comprising a nucleotide sequence encoding CPiLd protein including 

an amino acid sequence as set forth in SEQ ID NO: 83. 

6. A vector according to claim S wherein the nucleotide sequence is set forth in 
nucleotides 1006-3042 of SEQ ID NO: 81 

20 

7. A vector according to claim S wherein the vector is selected from the group 
consisting of plasmid, phagemid, phage and cosmid. 

8. A host cell transfected or transformed with the nucleic acid of claim 1 . 

25 

9. A host cell according to claim 8 herein the host cell is a yeast cell. 

10. A host cell according to claiin 9 wherein the yeast cell is a Candida sp. 

30 1 1 . A host cell according to claim 1 0 wherein the Candida sp. is Candida 

tropicalis. 
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A host cell according to claim 1 1 wherein the Candida tropicalis is Candida 



13. A host cell according to claim 12 i^erein the Candida tropicalis is HS343 

5 ura-. 

14. A method of producing a CPRA protein including an amino acid sequence as 
set forth in SEQ ID NO: 83 comprising: 

a) transforming a suitable host ceil with a DNA sequence that encodes the protein 
10 having the amino acid sequence as set forth in SEQ ID NO: 83; and 

b) culturing the cell imder conditions favoring the expression of the protein. 

15. The method according to claim 14 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

15 

16. Isolated nucleic acid encoding a CPJIB protein having the amino acid 
sequence set forth m SEQ ID NO: 84. 

17. Isolated nucleic acid comprising a coding region defined by nucleotides 1033- 
20 3069 as set forth in SEQ ID NO: 82. 

18. Isolated nucleic acid according to claim 17 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 82. 

25 19. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 84. 

20. A vector comprising a nucleotide sequence encoding CPiZS protem including 
an amino acid sequence as set forth in SEQ ID NO: 84. 

30 

21 . A vector according to claim 20 wherein the nucleotide sequence is set forth in 
nucleotides 1033-3069 of SEQ ID NO: 82. 
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22. A vector according to claim 20 wherein the vector is selected firom the ^ 
consisting of plasmid, phagemid, phage and cosmid.. 



23. A host cell transfected or transformed with the nucleic acid of claim 16. 

5 

24. A host cell according to claim 23 wherein the host cell is a yeast cell. 

25. A host cell according to claim 24 wherein the yeast cell is a Candida sp. 

10 26. A host cell according to claim 25 wherein the Candida sp. is Candida 

tropicalis. 

27. A host cell according to claim 26 wherein the Candida tropicalis is Candida 
tropicalis 20336. 

IS 

28. A host cell according to claim 27 wherein the Candida tropicalis is H5343 

ura-. 

29. A method of producing a CPRB protein including an amino acid sequence as 
20 set forth in SEQ ID NO: 84 comprismg: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 84; and 

b) culturing the cell under conditions favoring the expression of the protein. 

25 30. The method according to claim 29 wherein the step of culturing the cell 

comprises adding an organic substrate to media containing the cell. 

31. Isolated nucleic acid encoding a CYP52AJA protein having the amino acid 
sequence set forth in SEQ ID NO: 95. 

30 

32. Isolated nucleic acid comprising a coding region defined by nucleotides 1 177- 
2748 as set forth in SEQ ID NO: 85. 
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33. Isolated nucleic acid according to claim 32 conq)iising the nucleotide sequence 
as set forth in SEQ ID NO: 85. 

34. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

5 NO: 95. 

35. A vector comprising a nucleotide sequence encoding C}752i47i4 protein 
including an amino acid sequence as set forth in SEQ ID NO: 95. 

10 36. A vector according to claim 35 wherein the nucleotide sequence is set forth in 

nucleotides 1 177-2748 of SEQ ID NO: 85. 

37. A vector according to claim 35 wherein the vector is selected fix>m the group 
consistmg of plasmid, phagemid, phage and cosmid. 

15 

38. A host cell transfected or transformed with the nucleic acid of claim 31. 

39. A host cell according to claim 38 wherein the host cell is a yeast cell. 
20 40. A host cell according to claim 39 wherein the yeast cell is a Candida sp. 

41. A host cell according to claim 40 wherein the Candida sp. is Candida 

tropicdlis. 

25 

42. A host cell according to claim 41 wherein the Candida tropicalis is Candida 
tropicalis 20336. 

43. A host cell according to claim 42 wherein the Candida tropicalis is H5343 

30 ura-. 
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44. A mediod of producing a CYP52A1A protein including an amino acid 
sequmce as set forth in SEQ ID NO: 95 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having die amino acid sequence as set forth in SEQ ID NO: 95; and 
S b) culturing the cell under conditions fiivoring the expression of the protein. 

45. The method according to claim 44 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

10 46. Isolated nucleic acid encoding a 075242^ protein having the amino acid 

sequence set forth in SEQ ID NO: 96. 

47. Isolated nucleic acid comprising a coding region defined by nucleotides 1 199- 
2767 as set forth in SEQ ID NO: 86. 

IS 

48. Isolated nucleic acid according to claim 47 comprising the nucleotide 
sequence as set fbrdi in SEQ ID NO: 86. 

49. Isolated protein comprising an aniino acid sequence as set forth in SEQ ID 

20 NO: 96. 

50. A vector comprising a nucleotide sequence encoding CYP52A2A protein 
mcluding an amino acid sequence as set forth in SEQ ID NO: 96. 

25 51 . A vector according to claim 50 wherein the nucleotide sequence is set forth in 

nucleotides 1 199-2767 of SEQ ID NO: 86. 

52. A vector according to claim 50 wherein the vector is selected fiom the group 
consisting of plasmid, phagemid, phage and cosmid. 

30 

53. A host cell transfected or transformed with the nucleic acid of claim 46. 
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54. A lu>st cell acconling to claim S3 wherein the host ceU is a yeast 



SS. A host ceil according to claim 54 wherein the yeast ceil is a Candida sp. 

S 56. A host cell according to claim 55 wherein the Candida sp. is Candida 

tropicalis. 

57. A host cell according to claim 56 herein the Candida tropicalis is CawMa 
tropicalis 20336. 

10 

58. A host cell according to claim 57 wherein the Candida tropicalis is H5343 

lira-, 

59. A method of producing a CYP52A2A protein including an amino acid 
IS sequence as set forth in SEQ ID NO: 96 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 96; and 

b) culturing the cell under conditions favoring the e?q>ression of the protein. 

20 60. The method according to claim 59 wherein the step of culturing the cell 

comprises adding an oiganic substrate to media containing the ceU. 

61. Isolated nucleic acid encoding a CYP52A2B protein having the amino acid 
sequence set forth in SEQ ID NO: 97. 

25 

62. Isolated nucleic acid comprising a coding region defined by nucleotides 1072- 
2640 as set forth in SEQ ID NO: 87. 

63. Isolated nucleic acid according to claim 62 comprising the nucleotide sequence 
30 as set forth in SEQ ID NO: 87. 
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Isolated protein comprising an amino acid sequence as set forth in SEQ ID 



65. A vector comprising a nucleotide sequence encoding C7752il2^ protein 
5 including an amino acid sequence as set forth in SEQ ID NO: 97. 

66. A vector according to claim 65 wherein the nucleotide sequence is set forth in 
nucleotides 1072-2640 of SEQ ID NO: 87. 

10 67. A vector according to claim 65 wherein fhe vector is selected fix)m the groiq> 

consisting of plasmid« phagemid, phage and cosmid. 

68. A host cell transfected or transformed with the nucleic acid of claim 61. 

15 69. A host cell according to claim 68 wherein the host cell is a yeast cell. 

70. A host cell according to claim 69 wherein the yeast cell is a Candida sp. 

71 . A host cell according to claim 70 wherein the Candida sp. is Candida 

20 iropicalis. 

72. A host cell according to claim 71 wherein the Candida tropicalis is Candida 
tropicalis 20236. 

25 73. A host cell according to claim 72 wherein the Candida tropicalis is H5343 

ura-. 

74. A method of producing a CYP52A2B protein including an amino acid 
sequence as set forth in SEQ ID NO: 97 comprising: 
30 a) transforming a suitable host cell with a DNA sequence that encodes the protein 

having the amino acid sequence as set forth in SEQ ID NO: 97; and 

b) culturing the cell under conditions favoring the expression of the protein. 
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75. The method according to claim 74 >^ereiii the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

76. Isolated nucleic acid encoding a CYP52A3A protein having the amino add 
5 sequence set forth in SEQ ID NO: 98. 

77. Isolated nucleic acid comprising a coding region defined by nucleotides 1 126- 
2748 as set forth in SEQ ID NO: 88. 

10 78. Isolated nucleic acid according to claim 77 comprising the nucleotide sequence 

as set forth in SEQ ID NO: 88. 

79. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 98. 

15 

80. A vector comprising a nucleotide sequence encoding CYP52A3A protein 
includmg an amino acid sequence as set forth in SEQ ID NO: 98. 

81. A vector according to claim 80 wherein the nucleotide sequence is set forth in 
20 nucleotides 1 126-2748 of SEQ ID NO: 88. 

82. A vector accordmg to claim 80 wherein the vector is selected fiom the group 
consisting of plasmid, phagemid, phage and cosmid. 

25 83. A host cell transfected or transformed with the nucleic acid of claim 76. 

84. A host cell according to claim 83 wherein the host cell is a yeast cell. 

85. A host ceil according to claim 84 wherein the yeast cell is a Candida sp. 

30 

86. A host cell according to claim 85 wherein the Candida sp, is Candida 

tropicalis. 



-76- 



wo 00/20566 

87, 

tropicalis 20336. 



PCTAJS99/20797 

A host cell according to claim 86 whmin the Candida tropicalis is Candida 



88. A host eel! according to claim 87 wherein the Candida tropicalis is HS343 

5 ma-. 

89. A method of producing a CYP52A3A protein mcluding an amino acid 
sequence as set forth in SEQ ID NO: 98 comprising: 

a) transfonning a suitable host cell with a DNA sequence that encodes the protein 
10 having the anodno acid sequence as set forth in SEQ ID NO: 98; and 

b) culturing the cell under conditions favoring the e}q)ression of the protein. 

90. The method according to claim 89 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

15 

91. Isolated nucleic acid encoding a CKP52i4iJ3 protein having the amino acid 
sequence as set foitfa in SEQ ID NO: 99. 

92. Isolated nucleic acid comprising a coding region defined by nucleotides 913- 
20 2535 as set forth in SEQ ID NO: 89. 

93. Isolated nucleic acid according to claim 92 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 89. 

25 94. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 99. 

95. A vector comprising a nucleotide sequence encoding CYP52A3B protein 
including an amino acid sequence as set forth in SEQ ID NO: 99. 

30 

96. A vector acconiing to claim 95 wherein the nucleotide sequence is set forth in 
nucleotides 913-2535 6f SEQ ID NO: 89. 
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97. A va:tor according to claim 95 wherein the vector is s^^ 
consisting of plasmid, phagemid, phage and cosmid. 



98. A host cell transfected or transformed with the nucleic acid of claim 91 . 

5 

99. A host cell according to claim 98 wherein the host cell is a yeast cell. 

100. A host cell according to claim 99 wherein the yeast cell is a Candida sp. 

10 101 . A host cell according to claim 100 whorein the Candida sp. is Candida 

tropicalis, 

102. A host ceil according to claim 101 wherein the Candida tropicalis is 
Candida tropicalis 20336. 

15 

103. A host cell according to claim 102 wherein the Candida tropicalis is H5343 

lira-. 

1 04. A method of producing a CYP52A3B protein including an amino acid 
20 sequence as set forth in SEQ ID NO: 99 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 99; and 

b) culturing the cell under conditions favoring the e}q>ression of the protein. 

25 105. The method according to claim 104 wherein die step of culturing the cell 

comprises adding an organic substrate to media containing the ceil. 

106. Isolated nucleic acid encoding a CYP52A5A protein having the amino acid 
sequence set forth in SEQ ID NO: 100. 

30 

107. Isolated nucleic acid comprising a coding region defined by nucleotides 
1 103*2656 as set forth in SEQ ID NO: 90. 
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108. Isolated nucleic acid according to claim 107 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 90. 

109. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

5 NO: 100. 

1 10. A vector comprising a nucleotide sequence encoding CYP52A5A protein 
including an amino acid sequence as set forth in SEQ ID NO: 100. 

10 1 1 1. A vector according to claim 1 10 wherein the nucleotide sequence is set forth 

in nucleotides 1 103-2656 OF SEQ ID NO: 90. 

1 12. A vector according to claim 1 10 wherein the vector is selected from the 
group consisting of plasmid, phagemid, phage and cosmid. 

15 

1 13. A host cell transfected or transformed with the nucleic acid of claim 106. 

114. Ahostcell according to claim 113 herein the host cell is a yeast cell. 
20 115. A host ceU according to claim 114 v^ereui the yeast cell is a CamMi5p. 

1 16. A host cell according to claim 1 15 wherein the Candida sp. is Candida 

tropicalis. 

25 1 17. A host cell according to claim 116 wherein the Candida tropicalis is 

Candida tropicalis 20336. 

1 18. A host cell according to claim 1 1 7 wherein the Candida tropicalis is H5343 

uia-. 

30 

1 19. A method of producmg a CYP52A5A protein including an amino acid 
sequence as set forth in SEQ ID NO: 100 comprising: 
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a) transfonning a suitable host cell with a DN A sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 100; and 

b) culturing the ceil under conditions favoring the expression of the protein. 

S 120. The method according to claim 1 19 i^erein the step of culturing the cell 

comprises adding an organic substrate to media containing the cell. 

121. Isolated nucleic acid encoding a CYP52A5B protein having the amino acid 
sequence as set forth in SEQ ID NO: 101. 

10 

122. Isolated nucleic acid comprising a coding region defined by nucleotides 
1 142-2695 as set forth in SEQ ID NO: 91 . 

123. Isolated nucleic acid according to claim 122 comprising the nucleotide 
15 sequence as set forth in SEQ ID NO: 91. 

124. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 101. 



20 125. A vector comprising a nucleotide sequence encoding C}7J2i45£ protein 

including the amino acid sequence as set forth in SEQ ID NO: 101 . 

126. A vector according to claim 125 wherein the nucleotide sequence is set forth 
in nucleotides 1 142-2695 of SEQ ID NO: 91. 

25 

127. A vector according to claim 125 wherein the vector is selected firom the 
group consisting of plasmid, phagemid, phage and cosmid. 

128. A host cell transfected or transformed with the nucleic acid of claim 121. 

30 

129. A host ceil according to claim 128 wherein the host cell is a yeast cell. 
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A host cell according to claim 129 herein the yeast cell is a Candida sp. 



131. A host cell according to claim 130 vAimm the Candida sp, is Candida 

tropicalis. 

5 

132. A host cell according to claim 131 vdierein the Candida tropicalis is 
Candida tropicalis 20336. 

133. A host cell according to claim 132 wherein the Candida tropicalis is H5343 

10 ura*. 

134. A method of producing a CYP52A5B protein including an amino acid 
sequence as set forth in SEQ ID NO: 1 01 comprismg: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
IS having the amino acid sequence as set fordi in SEQ ID NO: 101; and 

b) culturing the cell under conditions favoring the expression of the protein* 

135. The method according to claim 134 wherem the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

20 

136. Isolated nucleic acid encoding a CYP52A8A protein having the amino acid 
sequence set forth in SEQ ID NO: 102. 

137. Isolated nucleic acid comprising a coding region defined by nucleotides 464- 
25 2002 as set forth in SEQ ID NO: 92. 

138. Isolated nucleic acid according to claim 137 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 92. 

30 139. Isolated protein comprising an amino acid sequence as set forth in SEQ ED 

NO: 102. 
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140. A vector comprising a nucleotide sequence encoding C}75J^ 
including an amino acid sequence as set forth in SEQ ID NO: 102. 



141. A vector according to claim 140 A^erein the nucleotide sequence is set foith 
5 in nucleotides 464-2002 of SEQ ID NO: 92. 

142. A vector according to claim 140 wherein the vector is selected firom the 
group consisting of plasmid, phagemid, phage and cosmid. 

10 143. A host cell transfected or transformed with the nucleic acid of claim 136. 

144. A host cell according to claim 143 wherein the host cell is a yeast cell. 

145. A host ceil according to claim 144 wherein the yeast cell is a Candida sp. 

15 

146. A host cell accordmg to claim 145 wherein the Candida sp. is Candida 

tropicalis. 

147. A host cell according to claim 146 wherein the Candida tropicalis is 
20 Candida tropicalis 20336. 

148. A host cell according to claim 147 wherein the Candida tropicalis is HS343 

ura-. 

25 149. A method of producing a dPiii^Al protein including an amino acid 

sequence as set forth in SEQ ID NO: 102 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 102; and 

b) culturing the cell imder conditions favoring the expression of the protein. 

30 

150. The method according to claim 149 wherein the step of culturing the cell 
comprises adding an oiganic substrate to media containing the cell. 
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151. Isolated nucleic acid encoding a CYP52A8B protein having the amino add 
sequence set forth in SEQ K) NO: 103. 

152. Isolated nucleic acid comprising a coding region defined by nucleotides 
5 1017-2555 as set forth m SEQ ID NO: 93. 

153. Isolated nucleic acid according to claim 152 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 93. 

10 1 54. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 103. 

1 55. A vector comprising a nucleotide sequence encoding CYP52A8B protein 
including an amino acid sequence as set forth in SEQ ID NO: 103. 

15 

1 56. A vector according to claim 1 55 wherein the nucleotide sequence is set forth 
in nucleotides 1017-2555 of SEQ ID NO: 93. 

157. A vector accordmg to claim 155 \^erein the vector is selected fit>m the 
20 group consisting of plasmid, phagemid, phage and cosmid. 

1 58. A host cell transfected or transformed with the nucleic acid of claim 151. 

1 59. A host cell according to claim 158 wherein the host cell is a yeast cell. 

25 

1 60. A host cell according to claim 1 59 wherem the yeast cell is a Candida sp. 

161. A host cell according to claim 160 wherein the Candida sp. is Candida 

tropicalis. 

30 

162. A host cell according to claim 161 wherein the Candida tropicalis is 
Candida tropicalis 20336. 

-83- 



wo 00/20566 PCTAiS99/Z0797 

163. A host cell according to claim 162 wherein the Candida tropiadis is HS343 

ura-. 



164. A method of producing a CYPS2A8B protein including an amino acid 
5 sequence as set forth in SEQ ID NO: 103 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 103; and 

b) culturing the cell under conditions favoring the expression of the protein. 

10 1 65. The method according to claim 1 64 wherein the step of culturing the cell 

comprises adding an organic substrate to media containing the cell. 

1 66. Isolated nucleic acid encoding a CYP52D4A protein having the ammo acid 
sequence set forth m SEQ ID NO: 104. 

15 

167. Isolated nucleic acid comprising a coding region defined by nucleotides 767- 
2266 as set forth in SEQ ID NO: 94. 

168. Isolated nucleic acid according to claim 167 comprising the nucleotide 
20 sequence as set forth in SEQ ID NO: 94. 

1 69. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 104. 

25 170. A vector comprismg a nucleotide sequence encoding CYP52D4A protein 

including an amino acid sequence as set forth in SEQ ID NO: 104. 

171. A vector according to claim 170 wherein the nucleotide sequence is set forth 
in nucleotides 767-2266 of SEQ ID NO: 94. 

30 

172. A vector according to claim 170 wherein the vector is selected from the 
group consisting of plasmid, phagemid, phage and cosmid. 
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173. A host cell transfected or transformed with the nucleic acid of claim 166. 

174. A host cell according to claim 173 \^erein the host cell is a yeast cell. 

5 175. A host ceU according to claim 174 herein the yeast ceU is a Qim£dbjp^ 

176. A host cell according to claim 175 wherein the Candida sp, is Candida 

tropicalis. 

10 177. A host cell according to claim 1 76 herein the Candida tropicalis is 

Candida tropicalis 20336. 

178. A host cell according to claim 177 wherein the Candida tropicalis is H5343 

ura-. 

15 

179. A method of producing a CYPS2D4A protein including an amino acid 
sequence as set forth in SEQ ID NO: 104 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 104; and 
20 b) cultuiing the cell under conditions fevoring the expression of the protein. 

180. The method according to claim 179 \^erein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

25 181. A method for discrimiiiating members of a gene faniily by quantifying the 

amount of target mRNA in a sample comprising: 

a) providing an organism containing a target gene; 

b) culturing the organism with an organic substrate which causes 
upregulation in the activity of the target gene; 

30 c) obtaining a sample of total RNA from the organism at a first point in 

time; 
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d) combining at least a portion of the san^ile of the totd RNA with a 
known amount of competitor RNA to form an RNA mixture, wha^in tiie con^>etitor RNA is 
substantially similar to the target mRNA but has a Icssct number of nucleotides compared to the 
target mRNA; 

S e) adding reverse transcriptase to the RNA mixture in a quantity sufficient 

to form corresponding target DNA and competitor DNA; 

f) conducting a polymerase chain reaction in the presence of at least one 
primer specific for at least one substantially non-homologous region of the target DNA within the 
gene &mily, the primer also specific for the competitor DNA; 
10 g) repeating steps (c-f) using increasing amounts of the competitor RNA 

vAdie maintaining a substantially constant amount of target RNA; 

(h) determining the point at which the amount of target DNA is 
substantially equal to the amount of competitor DNA; 

(i) quantifying the results by comparing the ratio of the concentration of 
IS unknown target to the known concentration of competitor; and 

(j) obtaining a sample of total RNA fix)m the organism at anoth^ point in 
time and repeating steps (d-Q- 

182. A method according to claim 181 wherein the target gene is selected from 
20 the groiQ) consisting of a CPR gene and a CYP gene. 

1 83. A method according to claim 1 82 wherem the CPR gene is selected from the 
group consisting of a CPRA gene (SEQ ID NO: 81) and a CPRB gene (SEQ ID NO: 82). 

25 184. A method according to claim 182 herein the CYP gene is selected fix)m die 

group consisting of CYP52A1 A gene (SEQ ID NO: 85), CYP52A2A gene (SEQ ID NO: 86), 
CYP52A2B gene (SEQ ID NO: 87), CYP5U3A gene (SEQ ID NO: 88), CYP52A3B gene (SEQ 
ID NO. 89), CYP52A5A gene (SEQ ID NO: 90), CYP52A5B gene (SEQ ID NO: 91), CYP52A8A 
gene (SEQ ID NO: 92), CYP52A8B gene (SEQ ID NO: 93) and CYP52D4A gene (SEQ ID NO: 

30 94). 



-86- 



wo 00/20566 PCTAJS99/207y7 

1 85. A method for increasing production of a dicarbo?^lic acid comprising: 

a) providing a host cell having a naturally occurring number of CPRA genes; 

b) moeasing, in the host cell, the number of CPRA genes which encode a CPltA 
protein having the amino acid sequence as set forth in SEQ ID NO: 83; 

5 c) culturing the host cell in media containing an organic substrate which 

iq)regulates the CPR4 gene, to effect increased production of dicarboxylic acid. 

1 86. A method for inoeasing the prodiiction of a CPRA protein having an amino 
acid sequence as set forth in SEQ ID NO: 83 comprising: 

10 a) transforming a host cell having a naturally occurring amount of CPRA protein 

with an increased copy number of a CPRA gene that encodes the CPRA protein having the amlnn 
acid sequence as set forth in SEQ ID NO: 83; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy nimiber of the CPRA gene. 

15 

187. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CPRB genes; 

b) increasing, in the host cell, the number of CPRB genes which encode a CPRB 
protein havmg the amino acid sequence as set forth in SEQ ID NO: 84; 

20 c) culturing the host cell in media containing an organic substrate which 

upregulates the CPRB gene, to effect increased production of dicarboxylic acid. 



1 88. A mediod for mcreasing the production of a CPRB protein having an ammo 
add sequence as set forth m SEQ ID NO: 84 comprising: 

25 a) transforming a host cell having a naturally occurring amount of CPRB protem 

with an increased copy number of a CPRB gene tiiat encodes the CPRB protein having tiie amino 
acid sequence as set forth in SEQ ID NO: 84; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CPRB gene. 

30 

189. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A1A genes; 
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b) increasing, in the host ceil, the number of CYP52A1 A genes v^ch encode a 
CYP52A1A protein having the amino acid sequence as set forth in SEQ ID NO: 95; 

c) cuhuring the host ceil in media containing an organic substrate i^^ch 
iq)regulates tfie CYP52A2A gene, to effect mcreased production of dicarbo?^lic acid. 

5 

1 90. A method for increasing the production of a CYP52A1A protein having an 
amino acid sequence as set forth in SEQ ID NO: 95 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A1A 
protein with an increased copy number of a CYP52A1A gene that encodes the CYP52A1A protein 

10 having the amino acid sequence as set forth in SEQ ID NO: 95; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A1A gene. 

1 9 1 . A method for increasing production of a dicarbosg^lic acid comprising: 

15 a) providing a host cell having a naturally occuiiing numb^ of CYP52A2A genes; 

b) increasing, m the host cell, the number of CYP52A2A genes which encode a 
CYP52A2A inotein having the ammo acid sequence as set forth in SEQ ID NO: 96; 

c) culturing the host cell m media containing an organic substrate v4iich 
upregulates the CYP52A2A gene, to effect increased production of dicarboxylic acid. 

20 

192. A method for increasing the production of a CYP52A2A protem having an 
amino acid sequence as set forth in SEQ ID NO: 96 comprising: 

a) trans£3rming a host cell having a naturally occurring amount of CYPS2A2A 
protein with an inoieased copy number of a CYPS2A2A gene that encodes the CYP52A2A protein 

25 having the amino add sequoice as set fordi in SEQ ID NO: 96; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A2A gene. 

193. A method for increasing production of a dicarboxylic acid comprising: 

30 a) providing a host cell having a naturally occurring number of CYP52A2B genes; 

b) increasing, in the host cell, the number of CYP52A2B genes which encode a 
CYP52A2B protein having the amino acid sequence as set forth in SEQ ID NO: 97; 
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c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A2B gene, to eflfect increased production of dicarboxylic acid. 



194. A niethod for mcreasing the Induction ofaCIP52/i25 protein to 
amino acid sequence as set forth m SEQ ID NO: 97 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A2B 
protein with an increased copy number of a CYP52A2B gene that encodes the CYP52A2B protein 
having the amino acid sequence as set forth in SEQ ID NO: 97; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A2B goie, 

195. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A3 A gmes; 

b) increasing, in the host cell, the number of CYP52A3A genes which encode a 
CYPS2A3A protein having the amino acid sequence as set forth in SEQ ID NO: 98; 

c) culturing the host cell in media containing an organic substrate \i^ch 
upregulates the CYP52A3A gene, to eflFect increased production of dicarbojqrUc acid. 

196. A method for increasing the production ofaClTJ245^ protein having an 
anuno acid sequence as set forth in SEQ ID NO: 98 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A3A 
protein with an increased copy number of a CYP52A3A gene that encodes the CYP52A3A protem 
having the ammo acid sequence as set forth in SEQ ID NO: 98; and 

b) culturing the cell and therein increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of flie CYP52A3A gene. 

197. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A3B genes; 

b) increasing, in the host cell, the number of CYP52A3B genes which encode a 
CYP52A3B protein having the amino acid sequence as set forth in SEQ ID NO: 99; 

c) culturing the host cell in media containing an organic substrate \\duch 
upregulates the CYP52A3B gene, to effect increased production of dicarboxylic acid. 
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198. A method for incrcasmg the production of a CIT52iii5piote^ 
ammo acid sequence as set forth in SEQ ID NO: 99 comprising: 

a) transfonning a host cell having a naturally occunring amount of CYP52A3B 
protein with an increased copy number of a CYP52A3B gene that encodes the CYP52A3B protein 

S having the amino acid sequence as set forth m SEQ ID NO: 99; and 

b) culturing die cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A3B gene. 

199. A method for increasing production of a dicarboxyiic acid comprising: 

10 a) providing a host cell having a naturally occurring number of CYP52A5A genes; 

b) increasing, in the host cell, the number of CYP52A5A genes which encode a 
CYP52A5A protein having the amino acid sequence as set forth in SEQ ID NO: 100; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A5A gene, to effect increased production of dicarboxyiic acid 

15 

200. A method for increasing the production ofaCI752^5ilprotem having an 
amino acid sequence as set forth in SEQ ID NO: 100 comprising: 

a) transforming a host cell having a naturally occuxring amount of CYPS2A5A 
protein with an increased copy number of a CYP52A5A gene that encodes the CYP52A5A protein 

20 having the amino acid sequence as set forth in SEQ ID NO: 100; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A5A gene. 

201 . A method for increasing production of a dicarboxyiic acid comprising: 

75 a) providing a host cell having a naturally occurring number of CYP52ASB genes; 

b) increasing, in the host cell, the number of CYP52A5B genes which encode a 
CYPS2A5B protem having the amino acid sequence as set forth in SEQ ID NO: 101; 

c) culturing the host cell in media containing an or^nic substrate vAAcYi 
upregulates the CYP52A5B gene, to effect increased production of dicarboxyiic acid. 

30 

202. A method for increasing the production of a CYPS2A5B protein having an 
amino acid sequence as set forth in SEQ ID NO: 101 comprising: 
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a) transfonning a host cell having a naturaUy occurring amount of CYP52A5B 
protein with an increased copy number of a CYP52A5B gene that encodes the CYP52A5B protein 
having the amino acid sequence as set forth m SEQ ID NO: 101; and 

b) culturing the cell and thereby increasing expression of the protein compared 
S widi that of a host cell containing a naturaUy occurring copy number of the CYP52A5B gene. 

203. A method for increasing production of a dicarfaoxylic acid comprising: 
a) providing a host cell having a naturally occurring number of CYP52A8A genes; 

10 b) increasing, in the host cell, the numbor of CYP52A8A genes which encode a 

CYP52A8A protein having the amino acid sequence as set forth in SEQ ID NO: 102; 

c) culturing the host cell in media containing an organic substrate which 
iq)regulates the CYP52A8A gene, to effect increased production of dicarboxylic acid. 

15 204. A method for mcreasing the prodiiction of a CZPJZl&d protein having an 

amino acid sequence as set forth in SEQ ID NO: 102 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A8A 

protem witii an increased copy number of a CYP52ASA gene that encodes the CYP52A8A protein 

having the amino acid sequoice as set forth in SEQ ID NO: 102; and 
20 b) culturing the cell and thereby increasing expression of the protein compared 

with that of a host cell containing a naturally occurring copy number of the CYP52A8A gene. 

205. A method for increasing production of a dicarboxylic acid comprising: 
a) providing a host cell having a naturally occurring number of CYP52A8B genes; 
25 b) increasing, in the host cell, the number of CYP52A8B genes ^ch encode a 

CYP52A8B protem having the ammo acid sequence as set forth m SEQ ID NO: 103; 

c) culturing the host cell in media containing an organic substrate which 

upregulates the CYP52A8B gene, to effect increased production of dicarboxylic acid. 

30 206. A method for increasing the production of a CYP52A8B protein having an 

amino acid sequence as set forth in SEQ ID NO: 103 comprising: 
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a) tzansfonning a host ccU having a naturally occurring amount of CYP52A8B 
protein with an increased copy number of a CYP52A8B gene that encodes the CYP52A8B protem 
having the amino acid sequmce as set forth in SEQ ID NO: 103; and 

b) culturing the cell and diereby increasing expression of the protein compared 
witfi that of a host cell containing a naturally occurring copy number of the CYP52A8B gene, 

207. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52D4A genes; 

b) increasing, in the host cell, the number of CYP52D4A genes which encode a 
CYP52D4A protein having the amino acid sequence as set forth m SEQ ID NO: 104; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52D4A gene, to effect increased production of dicarboxylic acid 

208. A method for increasing the production of a CYP52D4A protein having an 
amino acid sequence as set forth in SEQ ID NO: 104 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52D4A 
protein with an increased copy number of a CYP52D4A gene that encodes the CYP52D4A protem 
having the amino acid sequmce as set forth in SEQ ID NO: 104; and 

b) culturing the cell and thereby increasing expression of die protein compared 
with that of a host cell containing a naturally occurring copy nimiber of the CYP52D4A gene. 

209. A method for discriminating members ofa gene family according to claim 
181 i^erein culturing the organism with the organic substrate is accomplished m a fermentor. 
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C. tropicalis 20336 CPR Allele DNA Alignment of DS Sequence 



CPRA 
CPRB 



^ CATCA 5 

1 TATATGRTATATGATATATCTTCCTGTCTAATTATTATTCGTATTCGTTAATACTO 70 



CPRA 
CPRB 



5 AGATC ATCTATGGGGATAAJTA CGACAGCAACATTGCASAAAGAGCGTTGGTCACAATCGAAAi^ 70 

71 TCTTTATTTATGAAGAAAAGGAGAGTTCGTAAfhTGAGTTGAG^ 140 
* ** ♦*** *♦ ♦ ** ** *** * * * «■ ** 



CPRA 71 GCCTATG-GCGTTGCCGTCGTTGAGGCAAATGACAGCAC--CaACAATAACGATGGTC 137 
CPRB 141 GCAGAGGAGAGTATCCGACGftGGAGGAACTGGGTGAAArrrCATCTATGCTGTTGC 210 



CPRA 138 CTTCAGAACAGTCCATTGTTeACGCT--TAAGGCACGGATAATTACGTGGGGCAAAG6AACGCGGAATTA 205 
CPRB 211 TGTAAATCTTAGATTTCCTAGAGGTTGTTCTAGCAAATAAAGTGTTT^^ 280 



* * * 



CPRA 206 GTTATGGGGGGATCAAA--AGCGGAAGATTTGTGTTGC1TGTGGGTTTTTT 273 

CPRB 281 GGTAAAGGATCAACTGATTAGCGGAAGATTGGTGTTGCCTGTGGGGTTCTT TTATTTTTCATATGAT 347 

***** ♦ ♦ * ******* ♦**♦*♦♦*♦♦**♦♦«.» 

CPRA 274 TTCTTTGCGCAAGTAACATGTGCOUITTTAGTTTGTGATTAGCGTGCC-CCACAATTGGCATCGTGGACG 342 

CPRB 348 TTCTTTGCGCGAGTAACATGTGCCAATCTAGTTTATGATTAGCGTACCTCCACAAT^^ 417 

CPRA 343 GGCGTGTTTTGTCATACCCCAAGTCTTAACTAGCTCCACAGTCTCGACGGTGTCTCGACGAT^^ 412 

CPRB 418 GGCGTGTTTTGTCTTACCCCAAGCCTTATTTAGTTCCACAGTCTCGACGGTGTCTCGCCGATGTCTTCT 487 

CPRA 413 CCACCCCTCCCATGAATCATTCAAAGTTGTTGGGGGATCTCCACCAAGGGCACCGGAGTTAAOHSCT^ 482 



CPRA 483 TTTCTCCCACTTTGGTTSTGATTGGGGTAGTCTAGTGAGTTGGAGATTTTCTCT 552 

CPRB 549 TCTTTCCCACTTTGGTTGTGATTGGGGTAGCGTAGTGAGTTGGTGATTTTCT^^ 617 

CPRA 553 CGATATCGAAATTTGATGAATATAGAGAGAAGCCAGATCAGGACAGTAGATTGCCTTTGTAGTTAGAG^ 622 

CPRB 618 CGATATCGAA6TTTGATGAATATAG GAGCCAGATCAGCATGGTATATTGCCTTTGTAGATA6AGAT 683 



CPRA 623 GTTGAAGAGCAACTAGTTGAATTACACGCCACCACTTGACAGCAAGTCCAGTGAGCTGTAAAC^^ 692 
CPRB 684 GTTGAACAACAACTAGCTGAATTACACACCACCGCT AAACGATGCGC 730 



******** ******* ********** ***** ** 



CPRA 693 CCAGAGTGTCACCACCAACTGACGTTGGGTGGAGTT GTT GTTGTTGTTGTTGGCAGGGCCATATTGCTAA 762 
CPRB 731 ACAGGGT6TCACCGCCAACTGACGTTGGGTGGAGTTG TTGTTGGCAGGGCCATATTGCTAA 791 

CPRA 763 ACGAAGACAAGTAGCACAAAACCCAAGCTTAAGAACAAAAATAAAAAAAATTCATACGACAATTCCAAAG 832 

CPRB 792 ACGAAGAGZWVGTAGCACAAAACCCAAGGTTAAGAACAA TTAAAAAAATTCATACGACAATTCCACAG 858 

******* ******************* *********** * ********************.#^** itit 

CPRA 833 CCATTGATTTACATAAT--CAACAG-TAAGACAGAAAAAACTTTCAACATTTCAAAGTTCCCTTTTTCC^ 899 

CPRB 859 CCATTTACATAATCAACAGCGACAAATGAGACAGAAAAAACTTTCAACATTTCAAAGTTCCCTTTTTCCT 928 
***** * ** ** * *** * ****************«>***********«"*.«.«.*«.«.*«.ik^««.*« 

CPRA 900 ATTACTTCTTTTTTTTCTTCTTTCCTT CTTTCCTTCTGTTTTTCTTACTTTATCAGTCTTTTA 962 

CPRB 929 ATTACTTCTTTTTTTCTTTCCTTCCTTTCATTTCCTTTCCTTCTGCTTTTATTACTTTACCAGTC^ 998 
*************** *** ****** *********** **** ******** ********* 

CPRA 963 CTTGTTTTTGCAATTCCTCATCCTCCTCCTACTCCTCCTCAC CATG GCTTTAGACAAGTTAGATTTGTAT 1032 
CPRB 999 CTTGTTTTTGCAATTCCTCATCCTCCTCCT CACCATGGCTTTAGACAAGTTAGATTTGTAT 1059 
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CPRA 1033 GTCATCATAACATTGGTGGTCGCTGTAGCCGCXrrATTTTG^ 1102 

CPRB 1060 G^rCATa^]\ACATrGGTGGTCGC^ 1129 

CPRA 1103 ACACCGGGTTCCTCAACACGGACAGCGGAAGCi^CTCCAGAGACGT^ 1172 

CPRB 1130 ACACCGGGTTCCTCiUVCACGGACAGCGGAAGCAACTCCAGAGACGTCTTGCTGACAT^^ 1199 

CPRA 1173 TAAAAACACGTTGTTGTTGTTTGGGTCCCAGACGGGTACGGCAGAAGATTACGCCAACAAATT^ 1242 

CPRB 1200 TAAAAACACGTTGTTGTTGTTTGG<ntXX:AGACCGGTACGGCAGA 1269 

CPRA 1243 GAATTGCACTCCAGATTTGGCTTGAAAACGATGGTTGCAGATTTCGCTGATTACGATTGG^ 1312 

CPRB 1270 GAATTGCACTCCAGATTTGGCTTGAAAACCATGGTTGCAGATTTCGCTGATTACGA^ 1339 

CPRA 1313 GAGATATCACCGAAGACATCTTGGTGTTTTTCATTGTTGCCACCTATGG^ 1382 

CPRB 1340 GASATATCACCGAAGATATCTTGGTGTTTTTCATCGTTGCCACCTACGGT^ 1409 

CPRA 1383 TCCCGACGAGTTCCACACCTGGTTGACTGAAGAAGCTGACACTTTGAGTACCTTGAAAT^ 1452 

CPRB 1410 TGCCGACGAGTTCCACAarrGGTTGACTGAAGAAGCTGACACTTTGAGTACTTT^ 1479 

CPRA 1453 GGGTTGGGTAACTCCACGTACGAGTTCTTCAATGCCATTGGTAGAAAGTTTGACAGATTGTTGAGCGAG^ 1522 

CPRB 1480 GGGTTGGGTAACTCCACCTACGAGTTCTTCAATGCTATTGGTAGAAAGTTTGACAGATTGT^^ 1549 

CPRA 1523 AAGGTG6TGACAGGTTTGCTGAATACGCTGAAGGTGATGACGGTACTGGCACCTT66AC^ 1592 

CPRB 1550 AAGGTGGTGACAGATTTGCTGAATATGCTGAAGGTGACGACGGCACTGGCACCTTGGACGAAGA 1619 

CPRA 1593 GGCCTGGAAGGAOUlTGTCTTTGACGCCWGAAGAATGATrPGAACT^^ 1662 

CPRB 1620 GGCCTGGAAGGATAATGTCTTTGACGCCTTGAAGAATGACTTGAACTTTGAAGAAAAGGAATTGAAGTAC 1689 

CPRA 1663 GAACCAAACGTGAAATTGACTGAGAGAGACGACTTGTCTGCTGCTGACTCCCAAGTTTCCTTGTO 1732 

CPRB 1690 GAACCAAACGTGAAATTGACTGAGASAGATGACTTGTCTGCTGCCGACTCCCAAGTTTCCTTGGGTGAGC 1759 



CPRA 1733 CAAACAAGAAGTACATCAACTCCGAGGGCATCGACTTGACCAAGGGTCCATTCGACCACACCCACCCATA 1802 
CPRB 1760 CAAACAAGAAGTACATCAACTCCGAGGGCATCGACTTGACCAAGGGTCCATTCGACCACACCCACCCATA 1829 



CPRA 1803 CTTGGCCAGAATCACCGAGACGAGAGAGTTGTTCAGCTCCAAGGACAGACACTGTATCCACGTTGAAT^ 1872 
CPRB 1830 CTTGGCCAGGATCACCGAGACCAGAGAGTTGTTCAGCTCCAAGGAAAGACACTGTATTCACGTTGAATTT 1899 



CPRA 1873 GACATTTCTGAATCGAACTTGAAATACACCACCGGTGACCATCTAGCTATCTGGCCATCCAACTCCGACG 1942 
CPRB 1900 GACATTTCTGAATCGAACTTGAAATACACCACCGGTGACCATCTAGCCATCTGGCCATCCAACTCCGACG 1969 



CPRA 1943 AAAACATTAAGCAATTTGCCAAGTGTTTCGGATTGGAAGATAAACTCGACACTGTTATTGAATTGAAGGC 2012 
CPRB 1970 AAAACATCAAGCAATTTGCCAAGTGTTTCGGATTGGAAGATAAACTCGACACTGTTATTGAATTGAAGGC 2039 



CPRA 2013 GTTGGACTCCACTTACACCATCCCATTCCCAACCCCAATTACCTACGGTGCTGTCATTAGACACCATTTA 2082 
CPRB 2040 ATTGGACTCCACTTACACCATTCCATTCCCAACTCCAATTACTTACGGTGCTGTCATTAGACACCATTTA 2109 



CPRA 2083 GAAATCTCCGGTCCAGTCTCGAGACAATTCTTTTTGTCAATTGCTGGGTTTGCTCCTGATGAAGAAACAA 2152 
CPRB 2110 GAAATCTCCGGTCCAGTCTCGAGACAATTCrTTTTGTCGATTGCTGGGTTTGCTCCTGATGAAGAAAC^ 2179 



CPRA 2153 AGAAGGCTTTTACCAGACTTGGT6GTGACAAGCAAGAATTCGCCGCCAAGGTCACCCGCAGAAAGTTCAA 2222 
CPRB 2180 AGAAGACTTTCACCAGACTTGGTGGTGACAAACAAGAATTC6CCACCAAGGTTACCCGCAGAAAGTTCAA 2249 
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CPRA 2223 CATTGCCGATGCCTTGTTATATTCCTCCAACAACGCTCCATGGTCCGATGTTX^^ 2292 
CPRB 2250 CATTGCCGATGCCTTtnTATATTCCTCCAACAACACTCCA 2319 



CPRA 2293 GAAAACGTTCCACACrrGACTCCACGTTACTACTCCAT^^ 2362 
CPRB 2320 GAAAACATCCAACACTTGACTCCACGTTACTACTCCATTTCTTCTTCGTCG^ 2389 



CPRA 2363 TCAACGTTACTGCAGTTGTTGAAGCCGAAGAAGAAGCTGATGGCAGACCAGTCACTGGT^^ 2432 
CPRB 2390 TCAATGTTACTGCaGTCGTTGAGGCCGAAfiAAGAAGCaSATGGCAGACCAGTCACl^^ 2459 



CPRA 2433 CTTGTTGaWVGAACGTOGAAATTGTGCAAAACAAGACTGGCGAAAAGCCACTTCT^ 2502 
CPRB 2460 CTTGTTGAAGAACATTGAAATTGCGCAAAACAAGACTGGCGAAAAGCCACTTGTTCACTACGA 2529 



CPRA 2503 GGCCCAAGAGGCAAGTTCAACAAGTTCAAGTTGOMTGCATGTGAGAAGATCCAACTTC 2572 
CPRB 2530 GGCCCAAGAGGCMGTTCAACWISTTCAAGTTGCCAGTGCACGTGAGAAGMCC^ 2599 



CPRA 2573 AGAACTCCACCACCCCAim'ATCTTGArrGGTCCAGGTACTGGTGTTGCCCCATTGAGAG^ 2642 
CPRB 2600 AGAACTCCACCACCCCAGTTATCTTGATTGGTCCAGGTACTGGTGTTGCCCCATTGAGAGGTT^^ 2669 



CPRA 2643 AGAAAGAGTTCAACAAGTCJttGAATGGTGTCAATGTTGGCAAGACTTTGTTGT^^ 2712 
CPRB 2670 AGAAAGAGTTCAACAAGTCAAGAATGGTGTCAATGTTGGCAAGACTTTGTTGTTTTATGGTTGCA6 2739 



CPRA 2713 TCCJUUMAGGACTTTTTGTACAAGCAAGAATGGGCCGAGTACGCTTCTGTT^^ 2782 
CPRB 2740 TCCAACGAGGACTTTTT6TAC»AGCAA6AAT66GCCGJU>TACGCTTCTGTTT^^ 2809 



CPRA 2783 TGTTCAAIGCCTTCTCCAGACAAGACCCATCCAAGAAGGTTTACGTCCAGGATAAGATm 2852 
CPRB 2810 TGTTCAATGCCTTCTCTAGACAAGACCCATCCAAGAAGGTTTACGTCOUKaTAAGATTOT 2879 



CPRA 2853 CCAACTTGTGCACGAGTTGTTGACTGAAGGTGCCATTATCTACGTCTGTGGTGATGCCAGTAGAATG^ 2922 
CPRB 2880 CCAACTTGTGCACGAATTGTTGACCGAAGGTGCCATTATCTACGTCTGTGGTGACGOCAGTAGAATGGCC 2949 



CPRA 2923 AGAGACGTGCAGACCACAATTTCCAAGATTGTTGCTAAAAGCAGAGAAATTAGTGAAGACAAGGCTGCTG 2992 
CPRB 2950 AGAGACGTCCA6ACCACGATCTCCAAGATTGTTGCCAAAAGCAGAGAAATCAGTGAAGACAAGGCCGCTG 3019 



CPRA 2993 AATTGGTCAAGTCCTGGAAGGTCCAAAATAGATACCAAGAAGATGTTTGGTAGACTCAAACGAATCTCTC 3062 
CPRB 3020 AATTGGTCAAGTCCTGGAAAGTCCAAAATA6ATACCAAGAAGATGTTTGGTAGACTCAAACGAATCTCTC 3089 



CPRA 3063 TTTCTCCCAACGCATTTATGAATCTTTATTCTCATTGAAGCTTTAaVTATGTTCTACACTT^ 3132 
CPRB 3090 TTTCTCCCAACGCATTTATGAA TATTCTCATTGAAGTTTTACATATGTTCTATATTTCATTTTTTT 3155 



CPRA 3133 TTTTT1TTTTATTATTATATTACGAAACATA6GTCAACTATATATACTTGATTAAATGTTATAGAAACAA 3202 
CPRB 3156 TTT ^ATTATATTACGAAACATAGGTCAACTATATATACTTGATTAAATGTTATAGAAACAA 3215 



CPRA 3203 TAACTATTATCTACTCGTCTACTTCTTTGGCATTGACATCAACATTACCGTTCCCATTACCGTTGCCGTT 3272 

CPRB 3216 TAATTATTATCTACTCGTCTACTTCTTTGGCATTGGCATTGGCATTGGCATTGGCATTTO^ 3285 

CPRA 3273 GGCAATGCOGGGATATTTAGTACAGTATCTCCAATCCGGATTTGAGCTATTGTAGATaUOTGCAAGTCA 3342 

CPRB 3286 GGTAATGCCGGGATATTTAGTACAGTATCTCCAATCCGGATOTGAGCTATTGTAAATCAGCTGCA 3355 



CPRA 3343 TTCTCCACCTTCAACCAGTACTTATACTTCATCTTTGACTTCAAGTCCAAGTCA^ 3412 
CPRB 3356 TTCTCCACCTTCAACCAGTACTTATACTTCATCTTTGACTTCAAGTCCAAGTCATAW 3425 
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CPRA 3413 GCAAGAACTTCTGGCXATCCACGATATAGACGTTATT^ 3482 

CPRB 3426 GCAAGAACTTCTGGCCATCCACAATATAGACGTTATTCACGTTAT^^ 3495 

CPRA 3483 CTTATTGAACTTCTCAAACTTOWIAACAACCCCACGTC 3552 

CPRB 3496 CTTATTGAACTTCTCAAACTTCAAAAACAACCCCACGTCCCGCAACGTCATT^ 3565 

CPRA 3553 CTCACGTCGTCGGAGCTCGTCAAGTTCTCAATTAGATCGTTCTTGTTATTGATCTT^^ 3622 

CPRB 3566 CTCACGTCGTCGGJ\GCTCGTCAAGTTCTCAATTAGATCGTTCT^^ 3635 

CPRA 3623 ATTGCTGGAACACATTGTCCTCGTTGTTCAAATAGATCTTGAACAACTTTTTCA^ 3692 

CPRB 3636 ACTGCTGGAACACATTGTCCTCGTTGTTCAAATAGATCTTGAACAACT^^ 3705 

CPRA 3693 AATCTGGGCCAAGATCTCCGCCGGGATCTTCAGAAACAAGTCCTGCAACCCCTGGTCGATG^^ 3762 

CPRB 3706 GATCTGGGCCAAGATTTCCGCCGGGATCTTCAGAAACAAGTCCTGCAACCCCTGG^^ 3775 

CPRA 3763 TACAACAAGTCCAAGGGGCAGayVGTGTCTAGGCACGTGTTTCAACTGGTTCAACGAACATGTTCG^ 3832 

CPRB 3776 TACAACAAGTCTAAGGGGCAGJUUSTGTCTAGGCACGTGTTTCAACTGG^ 3845 

CPRA 3833 AGTTCGAGTTATAGTTATOGTACAACCATTTTGGTTTGATTTCGAAAATGACGGAGCTGA^ 3902 

CPRB 3846 AGTTCGAGTTATAGTTATCGTACAACCACTTTGGCTTGATTTCGAAAATGACGGAGCTGATCCCATCATT 3915 

CPRA 3903 CTCCT6GTTCCTCTCATAGTACAACTGGCACTTCTTCGAGA6GC1KAATTC 3972 



CPRB 3916 



3985 



CPRA 3973 ATATTCGGCAACAAGAGCCCGTACCGCTCACGGAGCATCAAGTCGTGGCCCTGGTTGTTCAACTTGT^ 4042 
CPRB 3986 ATATTCGGCAACAAGAGCCCGTAGCGCTCACGGAGCATCAAGTCGTGGCCCTGGTTGTTCAACTTGT 4055 



CPRA 4043 TGAAGTCCGAGGTCAAGACAATCAACTGGATGTCGATGATCTGGTGCGGGAACAAGTTCTTGCATOT 4112 
CPRB 4056 TGAAGTCCGATGTCAAGACAATCAACTGGATGTCGATGATCTGGTGCGGAAACAAGTTCTTGCACTTTAG 4125 



CPRA 4113 CTCGATGAAGTCGTACAACTCACACGTCGAGATATACTCCTGTTCCTCCTTCAAGAGCCGGATCCGC^ 4182 

CPRB 4126 CTCGATGAAGTCGTACAACT 4I45 

CPRA 4183 AGCTTGTGCTTCAAGTAGTCGTTG 4206 

CPRB 4146 4145 
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CPRA MALDKLDLYVIITLVVAVAAYFAKNQFLIXJPQDTGFIJJTDSGSNSRDVLLTIJaCNNK^ 60 
CPRB MALDKLDLYVIITLVVAVAAYFAKNQFLIX)PQDTGFLNTDSGSNSRDVIJ*TIJCK^^ 60 



CPRA IJiFGSQTGTAEDYANKLSRELHSRFGIJCTMVADETVDyDWDNFGDITEDILVFFIVATYGE 120 

CPRB LLFGSQTGTAEDYftNKLSRELHSRFGIJerMVMFADYDWDNFGDITEDILVFFIVATyGE 120 

★ 

CPRA GEPTDNADEFHTWLTEEADTLSTIJCYTVFGLGNST YEFFNAIGRKFDRLLSEKGGDRFAE 180 

CPRB GEPTDNADEFHTffLTEEADTLSTLRYTVFGLGNST YEFFNAIGRKFDRLLSEKGGDRFAE 180 



CPRA YAKGDDGTGTLDEDFMAWKDNVFDALKNDIJJFEEKELKYEPNVKI.TERDDLSAADSQVS^ 240 

CPRB raEGDDGTGTLDEDFMAWKDWFDALKNDLNFEEKELKYEPNVKLTERDDLSAADSQVSL 240 

CPRA GEPNKKYINSEGIDLTKGPFDHTHPYLARITETRELFSSKDRHCIHVEFDISESNLKYTT 300 

CPRB GEPNKKYINSEGIDLTKGPFDHTHPYLARITETRELFSSKERHCIHVEFDISESNLKYTT 300 



CPRA GDHLAIWPSNSDENIKQFAKCFGLEDKLDTVIELKALDSTYTIPFPTPITYGAVIRHHLE 360 

CPRB GDHIJaWPSNSDENIKQFAKCFGI£DKLDTVIELKALDSTYTIPFPTPITYGAVIRHHLE 360 

CPRA ISGPVSRQFFLSIAGFAPDEETKKAFTRIX3GDKQEFAAK\n'RRKFNIADALLYSSNN^ 420 

CPRB ISGPVSRQFFLSIAGFAPDEETKECTFTRLGGDKQEFATKVTRRKFNIADALLYSSNNTPW 420 

CPRA SDVPFEFLIENVPHLTPRYYSISSSSLSEKQLINVTAVVEAEEEADGRPVTGWTNLIJC^ 480 

CPRB SDVPFEFLIENIQHLTPRYYSISSSSLSEKQLINVTAVVEAEEEADGRPVTGVVTNLLKN 480 

* * 

CPRA VEIVQNKTGEKPLVHYDLSGPRGKFNKFKLPVHVRRSNFKLPKNSTTPVILIGPGTGVAP 540 

CPRB lEIAQNKTGEKPLVHYDLSGPRGKFNKFKLPVHVRRSNFKLPKNSTTPVILIGPGTGVAP 540 



CPRA LRGFVRERVQQVKNGVNVGKTLLFYGCRNSNEDFLYKQEWTVEYASVLGENFEMFNAFSRQ 600 

CPRB LRGFVRERVQQVKNGVNVGKTLLFYGCRNSNEDFLYKQEWAEYASVLGENFEMFNAFSRQ 600 



CPRA DPSKKVYVQDKILENSQLVHELLTEGAI I YVCGDASRMARDVQTTISKI VAKSREISEDK 660 
CPRB DPSKKVYVQDKILENSQLVHELLTEGAIIYVCGDASRMARDVQTTISKIVAKSREISEDK 660 



CPRA AAELVKSWKVQNRYQEDVW 680 
CPRB AAELVKSWKVQNRYQEDVW 680 
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C tropicalis 20336 CYP52 DNA Alignment of DS Sequence 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CyP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



1 GACCTGTGACGCTTCCGGTGTCTTGCCACCAGTCTCCAAGTTGACCGACGCC 

1 
1 
1 
1 
1 
1 
1 



GACATCATAAT 



TTACAATCATGG 



0 
70 
0 
11 
0 
0 
12 
0 
0 
0 



CyP52AlA 
CyP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



1 CATATGCGCTAATCTTCTTTTTCTTTTTATCACAGGAGAAACT^^^ 59 

71 ATTTCCGGTTACACTTCCAAGATGGCTGGTACTGaVAGAAGGT^ 140 

^ 0 

12 GACCCGGTTATTTCGCCCTCAGGTTGCTTATTTGAGCCOTAAAGTGCAGTAGAAACTTO 81 

^ • .0 

^ TGGAGTC 7 

13 AGCTCGCTAGGAACCCAGATGTCTGGGAGAAGCTCCGCGAAGAGGTCAACACGAACTTTGGCATGGAGTC 82 
1 0 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYPS2A3B 
CYP52A5A 
CyP52A5B 
Cyp52A8A 
CYP52A8B 
CyP52D4A 



^^^1^^^^'*^^'^^^^^^ 129 

141 CTTGTTTCGGTCAACOITTCTTGGTGTTGCACCCMTGAAGTACGC^ 210 

^ GCTCAACAATTGTCTGACAAGATCTC 26 

82 AAACTCTAGTATAATGGTGATAACTGGTTGCACTCTTGCXaVTAGGCATGA^ 151 

8 GCCAGACTTGCnxawnrrPTGACTCCCTTCGAAAC^^ 77 

83 GCCAGACTTGCTCACTTTTGACTCTCTTAGAA6CTCAAA6TAC6TTCAGGCTO 152 
1 0 
^ AAAACCGATACAAGAAGAAGACA6TCAA 28 
^ 0 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CyP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



130 GACCTCCAGTCAAACGGACa^GACAGACAAACACTTGGTGCGATGTTCATACCTACAG^ 
211 GCAACACAAGGCTAACGCCKOTTGTTGAACaCCGGTTGGGrrGGTTC^ 

152 ATATTTAATAAGCGTAGGAGTATAGGATGCATATGACCGGTTTTTCTATATTTTTAAGATAATCTCTM^ 
^ CCTGCAGA 

78 CGTATCTACCCGGGGGTACCACGAAACATGAAGACAG--CTACGTGCAACACGACGTTGCCACGCGGAG6 

153 C6TATCTACCCG6G66TGCCAC6AAACATGAAGACAG — CTACGTGCAACACGACGTTGCCGCGTGGAGG 
1 

29 CAAG3UW:GTTAATGTCAACCAGGCGCCAAGAAGACGG--TTTGGCG6ACTTGGAA^ 

1 



199 
280 
96 
221 
8 

145 
220 
0 
96 
0 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYPS2D4A 



200 TGTTTAGACGACGGriTCTTGCAAAGAC-AGGTGTTGGCATCTCGTACGATGGCAACTGC^ 
281 AGATGCTCATTGAAGTACACaVGAGCCATTTTGGACGCTATCCy^CTCTGGTGAATTGTCCMGGTTGAAT 
97 AGATGTTCATTGAAGTACACCAGAGCCATTTTGGACGCTATCCACTCTGGTGAATTGTCCAAGGTTGAAT 
222 AAATTTTGTATTCTCAGTAGGATTTCATCAAATTTCGO^CCAATTCTGGCGAAAAAATGATO 

9 ATTCGCGGCCGCGTCGACAGAGTAGCAGTTATGCMGCATGTGATTGTGGTTTTTGCAACCTGTTTGCAC 
14 6 AGGCA-AAGACGGCAAGGAACCTATCT-TGGTGCAGAAGGGACAGTCCGTTGGGTTGATTACTATTGCCA 
221 AGGCA-AAGACGGTAAGGAAOCTATTT-TGGTGCaiGAAGGGCCAGTCCGTTGGGTTGATTACTAT^ 

1 

97 CCATG-ATGTTTATGTTCTGGAGAGGT-TTTTCAAGGAATCGTCATCCTCCGCCACCACAAjGAAC 
1 
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350 
166 
291 
78 
213 
288 
0 

164 
0 
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CYP52A1A 
CYP52A2A 
CYP52A2B 
CyP52A3A 
CyP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



269 ACTTCTCCTTTAGGCAATAGAAAAAGACTAAGAGAAOUXrGT^^ 
351 ACGAAA CTTT CCCaGirCTTCflACTTGMlTGTOa^^ 
167 ACGAGACTTTCCCAGTCTTCMCTTGAATGTCCC^^ 

292 GTCAAAAGCTGA-ATAGTGaWOTTAAAGCACCTAAAATCACATATACAGCCTCTAGATACGAC^^ 

79 GACAAATGATCG-ACAGT-C»ATT--ACGTAATCrATATTATTTAGAGGGGTAATAAAAAAT;^^ 
214 CGCAGACGGACCXAGAGTATTTTGGGGCCGACGCTGGTGAGTTTAAGCCGGAGAGAT^^ 
289 CGCAGACGGACCCAGAGTATTTTGGGGCAGATGCTGGTGAGTTCAAACCGG^ 
1 

165 (OTAACGAQ^TCCATAaTCACAACCCACCGCAAGGTGACJU^TGCrrCaAC^^ 

■I 



338 
420 
236 
36C 
144 
281 
356 
0 

232 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CyP52A8A 
CYP52A8B 
CyP52D4A 



339 ATTTTTTTAGTCCCAGCATTCTGTGGGTTGCTCTGGGTTTCTAGAA 

421 CCCAACCAAGGCCTGGACCGG-AAGGTGTTGACTCCrrTaUiCAAGGAAATC^^ 

237 CCCAACCAAGGCXTGGACCG'-AAGGTGTTGACrrCCTTCAACAAGGAAATCAAGTC^ 

361 GCTCTTTATGATCTGAAGAAGCATTAGAATAGCT ACTATGAGCCACTATTGGTGTATATATTAGGGA 

145 GCC AGAATTTCAAACATTTTGOVAACJ^TGCAAAAGATGAGAAACTCCAA 

282 AGCATGAAGAACTTGGGGTGTAAATACTT<X:CGTTCAATGCTGGGCCACGGACTT^^ 
357 AGCATGAAGAAOTTGGGGTGTAAGTACnTGCCGTTCAATGCTGGGCCCCGGACTTGT^^ 
1 

233 ACCCCCAOUlGAACAGTGGAATAATGCCAGTCAA-awUlGAGTGGTGACAGACGAGGGAGAA 
1 



408 
489 
304 
427 
210 
351 
426 
0 

301 
0 



CYP52A1A 
CyP52A2A 
CyP52A2B 
CyP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



409 TTCAGATGGAAGAACAAAGAGATAAAAAACAAAAAAAAACTGAGTTTTGCACCAATAGAATGTTTG 474 

490 TTGCTGAAAAC--TTCAAGACCTATGCTGACCAAGCTACCGCTGA--AGTGAGAGCTGCA^ 555 

305 TT6CTGAAAAC--TTCAAGACCTATGCTGACCAAGCTACCGCrGA--AGTTAGAGCTGC^^ 370 

428 TTGGTOaU^TTAAGTACGTACTAATAAACaGAAGAAAATACTTAACCaATTTC^^ 497 

211 ACTCCGCAGC--ACTCCGAACCAACAAAACAATGGGGGGCGCCAG--AATTATTGAC TATT 267 

352. ACACTTTGATTGAAGOSAGCTACirrGCTAGTCCGGTTGGCC^^ 416 

427 ACACTTTGATTGAAGCGAGCrrATTTGCrrAGTCAGGTTGGCGC^ 491 

1 0 

302 CAACAGTGGTTCTGATGCAAGATaUSCTACACCGCTTCATCA^ 366 

1 GATGTGGTGCTTGATTTCTCGAGACACaTCCTTGTGAGGTGCCaTGAAT^ 58 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



475 -ATGATATCATCCACTC6CTAAACGAATCATGTGGGTGATCTTCTCTTTAGTTTTGGTCT 543 

556 CrPTAAAGATATTTATTCATTATTTAGTTTGCCTATTTATTTCTCATTACCCATC-ATCATTC^ 624 

371 CTTAAAGATATTTATTCACTATTTASTTTGCCTATTTATTTCTCATCACCCATC-AT^ 439 

498 -TGAGGGACCTTTTCrrGAACATTCGGGTCAAACTTTTTTTTGGAfiTGCGJ^ 566 

268 GTGACriTTTTTTTATTTTTTCCGTTAA--CTTrCAOT 329 

417 -CTlGCaiGGATCGGCGTACC-CACCAAGAAAGAAGTCGTTGATCJVACATGAGTGCTGCCGAC^^ 484 

492 -CTGCC3lGGGTCG6CGTACC-<J^CCAAGAAAGAA6TCGTTGATaUlTATGAGTGCTGCCG^ 559 

1 0 
367 -CATATGCOrATCACGAGCAACACCAGCAGGTTAGTGTATAGTAGTCTGTAGTTAAGTCAATGCAATGTA 435 

59 -TCTGTAAGGAaVGGGAACTGCTTCAACACCTTATTGO^TATTCTGTCTATTGC^^ 127 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYPS2A5A 
CYPS2A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



544 ACATGAAAGTGAAATCCAAA-TACACTACACTCCGGGTATTGTCCTTCGTTTTACAGATGTCTCATTGTC 612 

625 ATATAAAGTTACTTCGGA TATCATTGTAATCGTGCGTGTCGCAATTG6ATGATTTGGAA 683 

440 ATATAAAGTTATTTCGGAAC-TCATA TATCATTGTAATCGTGCGTGTTGCAATTGGGTAATTTGAAA 505 

567 AATAATAGTGTUVCCrrrTGTG-TAATAAATCPTCATGCAAGACTTGCATAATTCGAGCT 635 

330 GATGGTGTTGGTTTCTACAA-TGCAAGGGCACAGTTGAAGGTTTCCACATAACGT-TGCAa^ 397 

485 TGT — AAAGCTTTATAAGGA-TGTAACGGTAGATGGATAGTTGTGTAGGAGGAGCGGAGATAAATTA6AT 551 

560 TGT — ^AAAGTTTCACMGGA-TCTAGATGGATATGTA-AGGTGTGTAGGAGGAGCGGAGATAAATTAGAT 625 

1 0 

436 CCA — ATAAGACTATCCCTT-CTTACAACCAAGTTTTCTGCCGCGCCTGTCTGGCA-ACAGATGCTGGCC 501 

128 GATATCTGCCAAGGTATATAGCAGAACGTGCTGATGGTTCCTCCGGTCATATTCTGTTGGTAGTTCTGCA 197 
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CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CyP52D4A 



613 TTACTTTTQ^GGlX»TAGGaGTTGCCTGTGAGAGAT^ 682 

684 CTXXXXrTTGWUW^GATTCATGCACGAAGC^^ AATTTATCTCCTGAGACA 749 

506 CTGTAGTTGeAACGGATTCATGCACGATGCGGAGA-TAACACG AGATTATCTCCTAAGACA 565 

636 C--CAATTTGACCTCGTTCATGTGATAAAAGAAAAGCCAAAAGGTAATT AGCAGAC6C AATGGG 697 

398 T--C AATTO ATCCTCATT C31IG IGATAAAAGAAGAGCCAAAAGCT^ GGCAGACCCCCCAAGGGG 462 

552 TTGATTTTG TGTAAGGTTTTGGATGTCAACCTACTCCGCACTTCATGCA-GTCT 617 

626 TTGATTTTG TGTAAGGTTTAGCACGTCAAGCTACTCCGCACTTTGT GTGTAGGGAGCACA 685 

^ 0 

502 GACACACTT TaUUn?GAGTTTGGTCTAGAATTCTTGCAaiTGCACGACA-AGG3UUl 567 

198 GGTAAATTTGGATGTCAGGTAGTGGAGGGAGGTTTGTATCGGTTGTGTT-TTCTTCTTCC^ 266 



CYP52A1A 683 TCCTArCTCATGCTGTGTGTCTCTGGTTGGTTCATGAGTTTGGATT--GTTGTAC^^ 

CYP52A2A 750 ATTTTAGa:GTGTTCACACGCCCTTCTTTGTT-CTGAGCGAAGGAT--;^ 

CYP52A2B 566 ATTTTGGCCTCATTCACACGCCCTTCTT CTGAGCTAAGGAT— AAATAATTAGACTTCACAAGTT 

CYP52A3A 698 AAGATGGAGTGGAAAGCAATGGAAGCACGCCC-AGGACGGAGTAATTTAGTCCACACTACATCT 

CYP52A3B 463 AACACGGAGTAGAAAGCAATGGAAACyvCGCCC-ATGACAGTGCCATTTAGCCCACAACACATCT;^ 

CYP52A5A 618 GTGTACTACGTGTGCGTGTGCGCCAAGAGACA GCCCAAGGGGG— TGGTAGTGT-GTGTTGGCGGAA 

CYP52A5B 686 TACTC CGTC TGCGCCTGTGCCAAGAGACG GCCCAGGGG TAGTGT-GTGGTGGTGGAA 

CYP52A8A 1 GAATTCTTTGGATCTAATTCCAGCTGATC TTGCTAATCCT— TATCAACGTAGTTGTGATCATT 

CYP52A8B 568 — ACAACACTTGTGCTCTGATGCCACTTGATC TTGCTAAGCCT— TATCAACGTAATTGAGATCATT 

CYP52D4A 267 ATTaulCCTCXaM:GTCTCCTTCGGGTTCTGTGTCTGTGTCrGAGTC--6TA^ 



750 
816 

628 
766 
531 
681 
741 
62 
630 
334 



CyP52AlA 
CYP52A2A 
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751 GGWUIGCAAAGCTAACTAAATTTTCTTTGTCACAGGTACACTAACCTGTAAAACTTC^ 

817 CATTCTAATTTCC6T CACGCGAATATTGAA GGGGGGTACATGTGGCCGCTGAA- 

629 aiTTAAAATATCCGT---avCGCGAAAACIXKyACAAT 

767 -TTTTTTTTTG TGCGCaAGTACACACCTGGACT-TTAGTTTTTGCCCCATAAAGTTAACAAT^ 

532 CTTTTTTTTrriTCTGCGCAGGTGCACACCTGGACT-TTAGTTATTGCCCCaVTAAAGT^ 

682 GTGCATGT6ACACA ^ACGOGTGGGTTCTGGCCAATGGTGGACTAAGTGCAGGTAAGCAGCGACCTGAA 

63 GTTTGTCTG3U1TTAT— ACACACOVGTGGAAGAATAT^ — 

631 GTTT6TCTGAATTAT--ACACACCAGTGGAAGAATCTGGTCTAATCTGCACGCCTCATGGGCAT^^ 
335 



820 
869 
694 
830 
599 
748 
608 
128 
696 
404 



CYP52A1A 821 TCTTTCCTGATTGGGCAAGTGCACAAACTACA-ACCTGCAAAACAG CACTCCGCTTGTCACAGGTT 885 

CYP52A2A 870 -TGTGGGGG— CAGTAAACGCAGTCTCTC ^ CTCTCCCAGGAATAGTGCAACGG 918 

CYP52A2B 695 -TGTGGGGTGCCAGTAAACGOVGTCTCTCTCTCCCCCXrCC^^ 763 

CYP52A3A 831 -CCITTGGC-TCTCC^ACTCTCPCCGCCCCCAAATATTCGTTTTT-ACACCC^^ 897 

CYP52A3B 600 -CCrTTGGC-TCTCCCAGTGTCTCCGCCTCCAGATGCTCGTTTT--ACACCCTCGA^^ 665 

CYP52A5A 749 ACATTCCTCAACGCTTAA6ACACTGGTGG-TAGAGATGCGGACCAGG CTATTCTTGTCGT-GCTA 811 

CYP52A5B 809 ACACTCCTCAACGCTTGAGACACTGGTGGGTAGAGATGCGGGCCAGGA — GGCTATTCTTGTCGT-GCTA 875 

CYP52A8A 129 -TGTTT GTGGGGGGGGGGGGGTGCACACATTTTTAGTGCCA^ TTCTTTGTTGATTAC-CCCT 187 

CYP52A8B 697 -TGTTTT--GGGG6GGGGGGGGGGGGTGCACACATTTTTAGTGCGAAT6TTTGTTTGCTGGrTCC-CCCT 762 

CYP52D4A 405 GTGCACGACCATGAGTATGCAACTTGACGAGACGTCGTTAGGA ATCCACAGAATGATAGCAGGAA 469 



CYP52A1A 886 GTCTCCTCTCAACCyACAAAAAAATAAGArTAAACTTTCTTTGCTaVTGCATCAATCGGA^ 955 

CYP52A2A 919 AGGAAGGATAACGGATAGAAAGCGGAATGCGAGGAAAAT— TTTGAACGCGCAAGAAAAGCAATATCCGG 986 

CYP52A2B 764 GGGAAGGATAACGGATAGCAAGTGGAATGCGAGGAAAAT— TTTGAATGCGCAAGGAAAGCAATATCCGG 831 

CYP52A3A 898 AACACCCATTAGAGGAATGGGGCAAAGTTAAACyvCTTTTGGCTTCAATGATTCCTATTCG^ 967 

CYP52A3B 666 AACACCOITGAGGGGAATGGG-CAAAGTTAAACACTTTTGGTTTCAATGATTCCTATTTGCTACT 729 

CYP52A5A 812 CCCGGCGCATGGA-AAATCAACTGCGGGAAGAA— TAAATTTATCCGTAGAATCCACAGAGCG G 872 

CYP52A5B 876 CCCG-TGCACGGA-AAATCGATTGAGGGAAGAA — CAAATTTATCCGTGAAATCCACAGAGCG G 935 

CYP52A8A 188 CCCCCCTATCAT TCATTCCCACAGGATTAG--TTTTTTCCTCACTGGAATTCGCTGTCC 244 

CYP52A8B 763 CCCCCCTCCCCCCTATCATGCCCACAGGATTAG---TTTTTTCCTCACTGGAATTCGCTGTCC 822 

CYP52D4A 470 GCTTACTACGTGAGAGATTCTGCTTAGAGGATG--TTCTCTTCTTGTTGATTCCATTAGGTGGGTATCAT 537 
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CYPSZAOA 956 A--AAGAGOTGCCTTTGTGTAATGTGTGCCA^ X016 

CYP52A2A 987 GCTACCAGCTTTTGAGCCAGGGAAOiCAC^^ 1050 

CYP52A2B 832 GCTATCaiGGTTTTGAGCCAGGGGACACACrK^ 900 

CYP52A3A 968 CITCTCTTGTITTGTGCTTTGJ^ 1034 

CYP52A3B 730 CTCTTGTTTTGTGTTTTGATTTGCACCIIICTG^ 793 

CYP52A5A 873 A--TAAATTTGCCXACCTCCATCATCftACC3«X3-CCGCCACT 933 

CYP52A5B 936 A--TAAATTTGTCACATTGCTGCGTTGCCCAC CCACAGCATTCTC 978 

CYP52A8A 245 ACCTGTCAACCCCCCXCCCXrCCCXXr-CCaCTGCC--CT 293 

CYP52A8B 823 ^ACCTGTCAACCCCCTCAC TGCCCTGCCCTGC 853 

CYP52D4A 538 CTCCGGTGGTGACAACTTGACACAAGCAGTTCCGAGAACCACCaW3^^ 601 



CYP52A1A 1017 TTCCCTCACAATTATATAAACTCACXXaVCATTTCCACAGACCGTAATTTC^ 1085 

CYP52A2A 1051 CACCAAGACGCAATGAAAOXACATGGACATTTAGACCTCCCCyvCATGTGATAGr^^ 1115 

CyP52A2B 901 AACTCCACCAAGACACAATGAATCGCAaVTGGACArrTAGACCTC^ 970 

CYP52A3A 1035 CCTCCTCCTATATCTCTTTTTGCTAC-ATTTTGTTTTTTACGT^ 1103 

CyP52A3B 794 TGTCCTCCAATGTCTCTTTTTGCTGCCATTTT^^ 863 

CYP52A5A 934 CICTCTCTCTCTTTGTCITACrCCGCTCCCGTTTCC^ 1002 

CYP52A5B 979 TTTTCTCTCTCTTTGTCTTACTCCGCTCCTGTTTCCTTATCCAGAA^ 1048 

CYP52A8A 294 CCTGCACGTCCTGTGTTTTGTGCTGTGTCTTTCCOVCGCTATAAAAGCCCTGGCGTCCG^ 363 

CYP52ABB 854 CCTGCACGCCCTGTGrPTTGTGCTGTGGCACTCCCACGCTATAAAAjt^CCTG 923 

CYP5204A 602 TATCACTTCTACATGTCAACCTACGATGTATCTCaTCAC^ 671 



CYP52A1A 1086 GCTCTTCTTTTACTTAGTCAGGTTTGATAACTTCCTTTTTTATTACCCT^ 

CYP52A2A 1116 AGA ^AAAGTATAATAAGAACXXATGCCGTCCCnTTTCTTTCGCCGCTTCAACT^^ 

CYP52A2B 971 AAAGCAAAAAAAGTATAATAAGGACCCATGCCTTCCCTCTTCCTGGGCCGTTTCAACTTl^ 

CyP52A3A 1104 ACAA ASAAAAAAAAACTACAC TATGT CGTCTTCTCCATCGTTT 

CYP52A3B 864 ACAATCAGTGCAGCAAO^aiCAAAGAAGAAAAATAAAAAAACCTACaCTATCT 

CYP52A5A 1003 GCA—ACAATTATAAAGATACGCC AGGCCCaCCTTCTTTCTTl " rnr r i^CTT T TTTGACT6C 

CYP52A5B 1049 ACG — CTAGCCCAGCTGTCTTTCT TTTTCTTCACTTTTTTTGGTGTGTTGCTTTTTTGGCT^ 

CYP52A8A 364 TCC»CCCAGCXAAAAAAACAGTCrrAAAAAATTTGGTTGATCCTTTTTGGTTGC^ CCAC-C 

CYP52A8B 924 TCCTCACAGCCAAAAAAA AATTTGGCTGATCCTTTTGGGCTGCAAGGTTTTTCACCAC-C 

CyP52D4A 672 ATGGGTCAACATCCAATACAACTCCMCAA--TGAAGAAGAAAAACGGWUIGCAG^ 



1155 
1179 
1040 
1146 
933 
1064 
1110 
429 
982 
739 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



1156 ATTTATACCAACCAACC — AACCATGGCCACACAAGAAATaVTCGATTCTGTACTTCCGTACTTGTlCCAA 1223 

1180 TCTT ACACACATCACGAC CA-TG ACT6TACACGATATTATCGCCACATACTTCACCAA 1236 

1041 TTGTCTATCAACACACACACACCTCACGAC CA-TGA CTGCACAGGATATTATCGCC^ 1109 

1147 GCCC AAGAGGTTCPCGCTACa^AGTCCTTACATCGAGTACTTTCTTGACA-ACT^ 1208 

934 GCTC AGGAGGTTCTCGCTACCACTAGTCCTTACATCGAGTACTTTCPKy^ 995 

1065 ACTTTCTACAATCCavCCACAGCCACCACCACAGCCGCTATGATTGAACAACTCCT 1127 

1111 ACTTTCTACAACC ACCACCACCACCACCACCATGATTGAACAAATCCTAGAATATT 1166 

430 ACCACTTCCACCA — CCTCAACTATTCGAACAA- - AAGATGCTCGATCAGATCTTACATTACT 488 

983 ACCACCaCCACCA — CCTCAACTATTCAAACAA — AG GATGC TCGACCAGATCTTCCATTACT 1041 

740 GTGTG AGTTCCTGACCATTGCTAATCTA-TGGCTATATCTAGTTTGCTATCGTGGGATG 797 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
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1224 ATGGTACACTGTGATTACTGaVGCAGTATTAGTCTTCCTTATCTCCACAAACATCAAGAACTACGTCAAG 1293 

1237 ATGGTACGTGATAGTACCACTCGCTTTGATTGCTTATAGAGTCCTCGACTACrPrCTATGGCAGATAC^ 1306 

1110 ATGGTACGTGATAGTACCACTCGCTTTGATTGCTTATAGGGTCCTCGACTACTTTTACGGCAGATACCT 1179 

1209 ATGGTACTACTTCATACCrrTGGTGCTTCTTTCGTTGAACTTTATAAGTTTGCTCCA^ 1278 

996 ATGGTACTACTTCATCCCTTTGGTGCrTCTTTCGTTGAACTTCATCAGCTTGCrrCCACAC^^ 1065 

1128 --GGTATGTCGTTGTGCCAGTGTTGTACATCATCAAACAACTCCTTGCATACACAAAGACTCGCGTC5^ 1195 

1167 --GGTATATTGTTGTGCCTGTGTTGTACATCATCAAAOVACTCATTGCCTACAGCAAGACrrCG^ 1234 

489 --GGTACATTGTCTTGCCATTGTTGGCCATTATCAACCAGATCGTGGCTCATGTCAGGACCAATTATTTG 556 

1042 — GGTACATTGTCTTGCCATTGTTGGTCATT ATaVAGCAGATCGTGGCTCATGCCAGGACCAATTATTTG 1109 

798 -TGATCTGTGTCGTCTTCATTTGCGTTTGTGTTTATTTCGGGTAT-GAATATTGTTATACTAAATACTTG 865 
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CYP52A1A 
CYP52A2A 
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CyP52A8A 
CYP52A8B 
CYP52D4A 



1294 GCSUUtfaAATTGAAATGTGTCGATCCACCATACTT^^ i363 

1307 ATCTACAAGCiTCGTGCTAAACCATTTTTCXAGAAACAGAa^ 1376 

1180 ATGTACAAGCTTGGTGCTAAACCGTTTTTCCAGAAACAAACAGACGOT 1249 

1279 GAACGCAGGTTCCACGCCAAGCCACTCGGTAACTTT61X»GGGAC^ 1348 

1066 GAACGCAGGTTCCACGCCAAGCCGCTCGGTAACGTCGTGTT^ 1135 

il96 ATGAaAAAGTTGGGTGCTGCTCCAGTCAaUtfiCaAGTT^ 1265 

1235 ATGAAA(»GTTGGGTGCTGCrcaum»CJUUlCCAGTTGTA^ 1304 

557 ATGAAGAAATPGGGTGCTAAGCCATTCACACACGTCCAACGTGACGGGTGGTT^^ 626 

1110 ATGAAGAAGTTGGGCGCTAAGCCATTCACACATGTCCAACTAGACGGGTGGTCT 1179 

866 ATGCACAAACATGGCGCTOGAGAAATCGAGAATGTGATOUICGATGGGTT^^ 935 



CYP52A1A 1364 TCGCCGCCATCAAGGCCAAGAACGACGGTAG-ArTGGCTAACTTTGCC GATGAAGTTTT 1421 

CYP52A2A 1377 TTGAATTGTTGAAGAAGAAGAGCGACGGTAC-CCTCATAGACTTCACA CTCCAGCGTATC C 1436 

CYP52A2B 1250 TTHSAATTGTTAAAAAAGAAGAGTGACGGTAC-CCTCATAGACTTCACT CTCCAGCGTATC C 1309 

CYP52A3A 1349 TGCTTTTGATCTACTTGAAGTCGAAAGGTAC-GGTCATGAAGTTTGCTTGGGGCCTCTGGAACAACAAGT 1417 

CyP52A3B 1136 TGATCTTGATCTACTTAAAGTCGAAAGGTAC-AGTCATGAAGTTTGCCTGGAGCTTCTGGAACAACAAGT 1204 

CYP52A5A 1266 GGAAGGCTCTCCAGTTCAAGAAAGAGGGCAGGGCTCAAGAGTACAACG ATTACAAGTTTG 1325 

CYP52A5B 1305 GGAAGGCTCTCCAGTTCAAGAAAGAGGGCAGAGCTCAAGAGTACAACG ATCACAAGTTTG 1364 

CYP52A8A 627 GTGAATTCCTCAAAGCAAAAAGTGCTGGGAG-ACTGGTTGATTTAATC ATCTCCCGTTT 684 

CYP52A8B 1180 GTGAATTCCTCAAAGCTAAAAGTGCTGGGAG-GCAGGTTGATTTAATC ATCTCCCGTTT 1237 

CYP52D4A 936 TGCTACTCATGC6A6CCAGCAATGAGGGCCG-ACTTATCGAGTTCAGT GTCAAGAGATTCGAGT 998 
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1422 CGACGAGTACCCAAACCACACCTTCTACTTGTCTGTTGCCGGTGCTTTGAAGATTGTCATGACTGT 1487 

1437 ACGATCTCGATCGTCCCGATATCCCAACTTTCACATTCCCGGTCTTTTCCATCAACCTTGTCAATACCCT 1506 
1310 AAeCGCTCAATCGTCCAGATATCCCAACTTTTACATTCCCAATCTTTTCCATCAACCTTATCAGCACCCT 1379 
1418 ACATCGTCAGAGACCCAAAGTACAAGACAACTGGGCTCAGGATTGTTGGCCTCCCATTGATTGAAACCAT 1487 
1205 ACATTGTCAAAGACCCAAAGTACAAGACCACTGGCCTTAGAATTGTCGGCCTCCCATTGATTGAAACCAT 1274 
1326 ACCACTCCAASAACCCAAGCGTGGGCACCTACGTCAGTATTCTTTTCGGCACCAGGATCGTCGTGACCAA 1395 
1365 ACAGCTCCAAGAACCCAAGCGTCGGCACCTATGTCAGTATTCTTTTTGGCACCAAGATTGTC^ 1434 

685 CCACGA TAATGAGGACACTTTCTCCAGCTATGCTTTTGGCAACCATGTGGTGTTCACCAG 744 

1238 CCACGA TAATGAGGACACTTTCTCCAGCTATGCTTTTGGCAACCATGTGGTGTTCACCAG 1297 

999 -CGGCGCCACAT— CCACAfiAACAAGACATTGGTCAACCGGGCATTGAGCGTTCCTGTGATACTCACCAA 1065 
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CYP52A8B 
CYP52D4A 



CYP52A1A 
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1488 TGACCCA6AAAACATCAAGGCTGTCTTGGCCACCCAATTCACTGACTTCTCCTTGGGTACCAGACACGCC 1557 
1507 TGAfiCCGGAGAACATCAAGGCCATCTTGGCCACTCAGTTCAACGATTTCTCCTTGGGTACCAGACACTCG 1576 
1380 TGAGCCGGAGJACATCAAGGCTATCTTGGCCACCCAGTTCAACGATTTCTCCTTGGGCAa^ 1449 
1488 GGACCCAGAGAACATCAAGGCTGTTTTGGCTACTCAGTTCAATGATTTCTCTTTGGGAACCAGA^^ 1557 
1275 AGACCCAGAGAACATCAAAGCTGTGTTGGCTACTCAGTTCAACGATTTCTCCTTGGGAACTAGACACGAT 1344 
1396 AGATCCAGAGAATATCAAAGCTATTTTGGCAACCCAGTTTGGTGATTTTTCTTTGGGCAAGAGGCACACT 1465 
1435 GGATCCAGAGAATATCAAAGCTATTTTGGCAACCaUOTTTGGCGATTTTTCTTTGGGCAAGAGAC^ 1504 
745 GGACCCCGAGAATATCAAGGCGCTTTTGGCAACCCAGTTTGGTGATTTTTCATTGGGCAGCAGGGTCAAG 814 
1298 GGACCCCGAGAATATCAAGGCGCTTTTGGCAACCCAGTTTGGTGATTTTTCATTGGGAAGCAGGGTa 1367 




1577 CACTTTGCTCCTTTGTTGGGTGATGGTATCTTTACGTTGGATGGCGCCGGCTGGAAGCACAGCAGATC^ 1646 
1450 CACTTTGCTCCTTTGTTGGGCGATGGTATCTTTACCTTGGACGGTGCCGGCTGGAAGCACAGCAGATCTA 1519 
1558 TTCTTGTACTCCTTGTTGGGTGACGGTATTTTCACCTTGGACGGTGCTGGCTGGAAACATAGTAGAACTA 1627 
1345 TTCTTGTACTCCTTGTTGGGCGATG6TATTTTTACCTTGGACGGTGCTGGCTGGAAACACAGTA6AACTA 1414 
1466 CTTTTTAAGCCTTTGTTAGGTGATGGGATCTTCACATTGGACGGCGAAGGCTGGAAGCACAGCAGAGCCA 1535 
1505 CTTTTTAAACCTTTGTTAGGTGATGGGATCTTCACCTTGGACGGCGAAGGCTGGAAGCATAGCAGATCCA 1574 
815 TTCTTCAAACCATTATTGGGGTACGGTATCTTCACATTGGACGCCGAAGGCTGGAAGCACAGCAGAGCCA 884 
1368 TTCTTCAAACCATTGTTGGGGTACGGTATCTTCACCTTGGACGGCGAAGGCTGGAAGCACAGCAGAGCCA 1437 
1136 CAGTTTGCGCCGTTGTTGGGGAAAGGCATCTTTACTTTGGACGGCCCAGAGTGGAAGCAGAGCCGATCTA 1205 



** ** ** ***** * 
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CyP52AlA 1628 TGTTGAGACCACAGTTTGCTAGAGACCAGATTGGAavCGTTAAAGCCTTGGAACC^^ 1697 

CYP52A2A 1647 TGTTGAGACCACAGTTTGCCAGAGAACAGATTTCCCACGTOUUOT^^ 1716 

CYF52A2B 1520 TGTTGAGACXIACAGTTTGCCAGAGAACAGATTTCCCACGTCAAGTTGITCG/^^ 1589 

CYP52A3A 1628 TGTTGAGACCACAGTTTGCTAGAGAACAGGTTTCTCACGTCAAGTT^^ 169'^ 

CYP52A3B 1415 TGTTGAGftCCACAGTTTGCTAGAGAACAGGTTTCCXACCT 1484 

CYP52A5A 1536 TGTTtaGACCAC^GTTTGCCAGAGAACAAGTTGCTCATGTGACGT^^ 1605 

CYP52A5B 1575 TGO^AAGACCACJlGTTTtXCyiGAGAACAAGTTGCTCATGTGAC^^ 1644 

CyP52A8A 885 TGTTGAGAO^aVGTTTGCCAGAGAACAAGTTGCTCATGTGAC^^ 954 

CYP52A8B 1438 TGTTGAGACCACAGTTTGCCAGAGAGCAAGTTGCTCATGTGA^ 1507 

CYP52D4A 1206 TGTTGCGTCCGOUiTTTGCCAAAGATCGGGTTTCTCATATCCTGGATCT 1275 



***** * *** 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
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CYP52A1A 
CYP52A2A 
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1698 
1717 
1590 
1698 
1485 
1606 
1645 
955 
1508 
1276 



1768 
1787 
1660 
1768 
1555 
1676 
1715 
1025 
1578 
1346 



GGCTAAGCAGATCAAGTTGAACCAGGGAAAGACTTTCGATATCCAAGAATTGTTCTTTAGATTTA 

CTTCAAAO^CGTCAfiAAAGGCACAGGGCAAGACTTTTGACATCCAGG^ 

CTTCAAGCACGTCAGAAAGGCACAGGGCAAGACTTTTGACyiTCCMGAATTG^ 

CTTCAAGCACGTTAGAAAGCACCGCGGTCAAACGTTCGACATOaiAGAATTGTTCOT 

CTTOlAGCACGTTAGAAAAaiCCGCGGTCAGACTTTTGACATCCAAGAATTGT^ 

GMGJU^GCATATTCTTAAGCACAAGGGTGAATACTTTGATATCCAGGAATTCT 

GAAGAAGCATATCCTTAAACACAAGGGTGAGTACTTTGATATCCAGGAATTGTTCTT^ 

GAAGAAGCATATCCTTAAACACAAGGGTGJUaTACTTTGATATCCAfiGAATTGTTCTTTA 

GAAGAAGCATATTCTTAAGCACAAGGGTGAATACTTTGATATCCAGGAATTGTTCTTTA^ 

TCGGAAGCACATTGATGGCCACAATGGAGACTACTTCGACATCXAGGAGCTCT 

** ** * ** * ** ** ***** ** * * ** * ** * * 

GACACCX;CTACTGAGTTCTTGTTTG6TGAATCCGTTC»CTC^ 

GACTCCGCCJ^CCGAGTTTTTGTTTGGTGAATCCGTTGAGTCCTTGAG^ 

GACTCCGCCACTGAGTTTTT6TTTGGTGAATCCGTTGAGTCCCT 

GACTCCGC(»CCGA6TTCTTGTTTGGTGAGTCTGCTGAATCCTTGA^^ 

GACTCCGCCACCGAGTTCTTGTTTGGTGAGTCTGCTGAATCCTTGAGAG^ 

GATTCGGCCACGGAGTTCTTATTTGGTGAGTCCGTGCACTCCTTAAAGGACGAATC^^ 

GACTCGGCCACGGACTTCTTATTTGGTGAGTCCGTGak(^^ 

GACTCGGCCACGGAGTTCTTATTTGGTGAGTCCGTGCACTCCTTAAAGGACGAGGAAATTGGC^^ 
GATTCaGCGACGGAGTTCTTATTTGGTGAGTCCGTGCACTCCTTAAGGGACGAGGAAATTGGCTACGATA 
GATGTGGCGACGGG6TTTTTGTTTGGCGAGTCTGTGGGGTCGTTGAAAGACGAAGATGCGAGG 



1767 
1786 
1659 
1767 
1554 
1675 
1714 
1024 
1577 
1345 



1837 
1856 
1729 
1837 
1624 
1745 
1784 
1094 
1647 
1408 



CYP52A1A 1838 CTCCAAACGAAA TCCCAGGAAGAGAAAACTTTGCCGCTGCTTTCAACGTTTCCCAACACTACTTGGC 1904 

CYP52A2A 1857 TCAATGCGCTTGACTTTGACGGCAAGGCTGGCTTTGCTGATGCTTTTAACTATTCGCAGAATTATTTGGC 1926 

CyP52A2B 1730 TCAATGCACTTGACTTTGACGGCAAGGCTGGCTTTGCTGATGCTTTTAACTTICTC 1799 

CyP52A3A 1838 CAACCACCAAGGATTTCGATGGCAGAAGAGATTTCGCTGACGCTTTCAACTATTCGCAGACTTACC^ 1907 

CYP52A3B 1625 CAACCACCAAGGATTTCGAAGGCAGAGGAGATTTCGCTGACGCTTTCAACTACTCGCAGACTTACCA« 1694 

CYP52A5A 1746 AAGACGATATAGATTTTGCTGGTAGAAAGGACTTTGCTGAGTCGTTCAACAAAGCCCAGGAATACrra 1815 

CyP52A5B 1785 AAGACGATATAGATTTTGCTGGTAGAAAGGACTTTGCTGAGTCGTTCAACAAAGCCCAGGAGTATTT^ 1854 

CYP52A8A 1095 CGAAAGACATGT CTGAAGAAAGACGCAGATTTGCCGACGCGTTCAACAAGTCGCAAGTCTACGTGGC 1161 

CYP52A8B 1648 C6AAGGACATGG CTGAAGAAAGACGCAAATTTGCCGACGCGTTCAACAAGTCGCAAGTCTATTTGTC 1714 

CyP52D4A 1409 TTCCTGGAAGCATTCAATGAGTCGCAGAAGTATTTGGC 1446 

** * * ** ** * * * 



CYP52A1A 1905 CACOiGAAGTTACTCCCAGACTrPITACTTTTTGACCAACCCTAAGGAATTC^ 1974 

CYP52A2A 1927 TTCGAGAGCGGTTATGCAACAATTGTACTGGGTGTTGAACGGGAAAAAGTTTAAGGAGTGCAACGC^ 1996 

CYP52A2B 1800 TTCGAGAGCGGTTATGCAACAATTGTACTGGGTGTTGAACGGGAAAAAGTT^ 1869 

CYP52A3A 1908 CTACAGArrTTTGTTGCAACAAATGTACTGGATCrTGAATG^ 1977 

CyP52A3B 1695 CTACAGATTTTTGTTGCAAC»AATGTACT6GATTTTGAAT6GCGC6GAATT^^ 1764 

CYP52A5A 1816 TATTAGAACCTTGGTGCAGACGTTCrACTGGTTGGTCAACAACAAGGAGTTTAGA^ 1885 

CYP52A5B 1855 TATTAGAATTTTGGTGCAGACXnrrCTACTGGTTGATCJkACAACAAGGAGT^ 1924 

CYP52A8A 1162 CACCAGAGTTOCTTTACAGAACTTGTACTGGTTGGTCAACAACAAAGAGTTCA^^ 1231 

CyP52A8B 1715 CACCAGAGTTGCTTTACAGACATTGTACTGGTTGGTCAACAACAAAGAG^ 1784 

CYP52D4A 1447 AACTAGGGCAACGTTGCACGA6TTGTACTTTCTTTGTGAC66GTTTA661^CGC(^ 1516 



Figure 15F 
23/53 



SUBSTITUTE SHEET (RULE 26) 



wo 00/20566 



PCT/US99/20797 



CyP52AlA 
CYP52A2A 
cyP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



1975 GTCCACCACTTGGCaUlCTACTT^ 2044 

1997 GTGCACAAGTTTGCTGRCTACTACGTO^ 2066 

1870 GTGCACAAGTTTGCTGACTATTACGTCW;^^ I939 

1978 GTGCACAAGTTTGCTGACCACTATGTGCAAAAGGCTTTGGAGTTGACC^ 2047 

^''65 GTGCACAAGTTTGCTGACCACTATGTGCAAAAGGCTTTGGAGTTGACC 1834 

1886 GTGCACAA GTTC ACCAACTACTATGTTCASftAAGCrTTGGATGCT« 1955 

1925 GTCCACAA GTTTA CCaACTACTATGTTCAGAAAGCTTTGGAT^ 1994 

1232 GTCCACAAGTTTACCAACTACTATGTTCauaUU\GCCTTGGATGCTA(^ 1301 

I'^SS GTCCACAAGTTCACCAACTACTATGTTaUSAAAGCC^^ 1854 

1517 GTGCGAAAGTTCTGCAGCaWSTGTGTCCACAAGGCGTTAGATGTT^^ ^A 1577 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
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CYP52A5A 
CYP52A5B 
CyP52A8A 
CYP52A8B 
CYP52D4A 



2045 CCAAGTCCGGTTACGTTTTCTTGTACGAATTGGTTAAGCAAACCAGAGATCO^ 

2067 ATGGTT ATGTGTTOTTGTACGAATTGGTCAAGCAAACCAGAGACAAGCAAGTGTTGAGAGACCA 

1940 ATGGTT ATGTGTTCTTGTACGAGTTGGTCAAGCAAACCAGAGACAGGCAAGTGTTGAGAGACCA 

2048 ACGGCT ATGTGTTCTTGTACGAGTTGGCTAAGCAAACCAGAGACCCAAAGGTCTTGAGAGACCA 

1835 ACGGCT ATGTGTTCTTGTACGAGTTGGCTAAGCAAACTAGAGACCCAAAGGTCTTGAGAGACCA 

1956 GTGGGT ATGTGTTCTTGTACGAGCTTGTCAAGCAGACAAGAGACCCCAATGTGTTGCGTGACCA 

1995 GCGGGT ^ATGTGTTCTTGTATGAGCTTGTCAAGCAGACGAGAGACCCCAAGGTGTTGCGTGACCA 

1302 GCGGGT ATGTGTTCTTGTATGAGCTTGTCAAGCAGACGAGAGACCCC/^AGGTGTTGCGTGACCA 

1855 GCGGGT ATGTGTTCTTGTACGAGCTTGCCAAGCA6ACGAAAGACCCCAATGTGTTGCGTGACCA 

1578 GCGAGT ^ACGTGTTTCTCCGCGAGTTGGTCAAACACACTCGAGATCCCGTTGTTTTACAAGACCA 

♦ ******** ** *** 

2115 ATTGTTGAACATTATGGTTGCCGGAAGAGACACCACTGCCGGTTTGTTGTCCTTTGCTTTGTTTGAATTG 
2131 ATTGTTGAACATCATGGTTGCTQGTAGAGACACCACCGCCGGTTTGTTGTCGTTTGTTTTCTTTGAATTG 
2004 afTtm-GAACATCATGGTTGCCGGTAGAGACACCACCGCCGGTTTGTTGTCGTTTGT TTT C ^ 
2112 GTTATTGAACATTTTGGTTGCCGGTAGAGACACGACCGCCGGTTTGTTGTCATTTGTTTTCTACGAGTTG 
1899 GTTGTTGAACATTTTGGTTGCCGGTAGAGACACGACCGCCGGTTTGTTGTCGTTTGTGTT^ 

1366 GTCTTTGAACATCTTGTTGGCAGGAAGAGACACCACTGCTGGGTTGTTGTCCTTTGCTGTGTTO 

1642 AGCGTTGAACGTCTTGCTTGCTGGACGCGACACCACCGCGTCGTTATTATCGTTTGCAACATTT^ 

****** * ******* «> «***# #* ** * *• * 



2114 
2130 
2003 
2111 
1898 
2019 
2058 
1365 
1918 
1641 



2184 
2200 
2073 
2181 
1968 
2089 
2128 
1435 
1988 
1711 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



2185 GCTAGACACCCAGAGATGTGGTCCAAGTT6AGAGAAGAAATCGAAGTTAACTTTGGTGTTGGTGAAGACT 2254 
2201 GCCAGAAACCCAGAAGTTACCAACAAGTTGAGAGAAGAAATTGAGGACAAGTTTGGACTCGGTGA^ 2270 

2182 TCAAGAAACCCTGAGGTGTTTGCTAAGTTGAGAGAGGAGGTGGAAAACAGATTTGGACTCGGTGAAGAAG 2251 
1969 TCGAGAAACCCTGAAGTGTTTGCCAAGTTGAGAGAGGAGGTGGAAAACAGATTTGGACTCGGCGAAGAGG 2038 
2090 GCCAGACACCCAGAGATCTGGGCCAAGTTGAGAGAGGAAATTGAACAACAGTTTGGTCTTGGAGAAGACT 2159 
2129 GCCAGAAACCCACACATCTGGGCCAAGTTGAGAGAGGAAATTGAACAGCAGTTTGGTCTTGGAGAAGACT 2198 
1436 GCCAGAAACCCACACATCTGGGCCAAGTTGAGAGAGGAAATTGAACAGCAGTTTGGTCTTGGAGAAGACT 1505 
1989 GCCAGGAACCCACACATCTGGGCCAAGTTGAGAGAGGAAATTGAATCACACTTTGGGCTGGGTGAGGACT 2058 
1712 GCCCGGAATGACCACATGTGGAGGAAGCTACGAGAGGAGGTT ATCCTGA CGATGGGACCG 1771 



**** ** 
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2255 CCCGCGTTGAAGAAATTACCTTCGAAGCCTTGAAGAGATGTGAATACTTGAAGGCTATCCTTAACGAAAC 2324 
2271 CTAGTGTTGAAGACATTTCCTTTGAGTCGTTGAAGTCCTGTGAATACTTGAAGGCTGTTCTCAACGAAAC 2340 
2144 CTCGTGTTGAAGACATTTCCTTTGAGTCGTTGAA6TCATGTGAATACTTGAAGGCTGTTCTCAACGAAAC 2213 
2252 CTCGTGTTGAAGAGATCTCGTTTGAGTCCTTGAAGTCTTGTGAGTACTTGAAGGCTGTCATCTATGAAAC 2321 
2039 CTCGTGTTGAAGAGATCTCTTTTGAGTCCTTGAAGTCCTGTGAGTACTTGAAGGCTGTCATCAATGAAGC 2108 
2160 CTCGTGTTGAAGAGATTACCTTTGAGAGCTTGAAGAGATGTGAGTACTTGAAAGCGTTCCTTAATGAAAC 2229 
2199 CTCGTGTTGAAGAGATTACCTTTGAGAGCTTGAAGAGATGTGAGTACTTGAAAGCGTTCCTTAACGAAAC 2268 
1506 CTCGTGTTGAAGAGATTACCTTTGAGAGCTTGAAGAGATGTGAGTACTTGAAGGCCGTGTTGAACGAAAC 1575 
2059 CTCGTGTTGAASAGATTACCTTTGAGAGCTTGAAGAGATGTGAGTACTTGAAAGCCGTGTTGAACGAAAC 2128 
1772 TCCAG— TGATGAAATAACCGTGGCCGGGTTGAAGAGTTGCCGTTACCTCAAAGCAATCCTAAACGAAAC 1839 



*•* ** ** 
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CYP52A1A 2325 CTOCGTATGTACCCATCTGTTCCTGTCAACTTTA^ 2394 

CYP52A2A 2341 CTTGAGATTGTACCCATCCGTGCCACAGAATTTCAGAGT^^ 2410 

CYP52A2B 2214 TTTGAGATTGTACCCATCCGTGCCACAGAATTTCAGAGTTG^ 2283 

CYP52A3A 2322 CTTGAGATT6TACCCATCGGTTCCMACAACTTTAGAGTTGCT 2391 

CYP52A3B 2109 CTTGASAXTGTACCCATCTGTTCCACACAACTTCAGAGT^^ 2178 

CyP52A5A 2230 CTTGCGTATTTACCOUiGrGTCCCAAGAAACTTCAGAATCG^^ 2299 

CYP52A5B 2269 CTTGC3GTGTrCACCCMGrGTCCCAAGAAACTTCAG^ 2338 

CYP52A8A 1576 TTTGAGATTACACCCAAGTGTCCCAAGAAACGCAAGATTTGCGATTAA^ 1645 

CYP52A8B 2129 GTTGAGATTACACCCAAGTGTCCCAAGaAACGCAAGaTT^^ 2198 

CYP52D4A 1840 TCTTCGACTATACCCAAGTGTGCCTAGGAACGCGAGATTTGCTACGAGGAATAC^ 1909 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



2395 GGTGGTGCTAACGGTACCGACCCAATCTAOVTTCCTAAAGGCTCCACTGTTGCTTAC^^ 2464 

2411 GGTGGTAAGGACGGGTTGTCTCCrrGTTTTGGTGAGAAAGGGTCAGACCGTTATTTACGGTGTO 2480 

2284 GGTGGTAAGGACGGGTTATCrCCTGTTTTGGTCAGAAAGGGTCAAACCGTTATGTACGG^ 2353 

2392 GGTGGTGAAGATGGATACTCGCCAATTGTCGTCAAGAAGGGTCAAGTTGTCATGTACACTGOT 2461 

2179 GGTGGT/JUIGACGGATGCTCGCCAATTGTTGTCJU^GAAGGGTCAAGTTGTCATGTACACT 2248 

2300 GGTGGTTCAGACGGTACCTCGCCAATCTTGATCa^AAAGGGAGAAGCTGTC 2369 
2339 GGTGGTCCAGACGGTACCCAGCCAATCTTGATCCAAAAGGGAGAAGGTGTGTCGTATGGTATCAACTCTA 2408 

1646 GGTGGCCCCAACGGCAAGGAT(XTATCTTGATCAGGAAGGATGAGGTGGTGCAGTACTOT 1715 

2199 GGTGGCCCOyiCGGCAAGGATCCTATCTTGATaiGAAAGAATGAGGTGGTGCA^ 2268 

1910 GGAGGTCCAfiATGGATCGTrrCCGATTTTGATAAGAAAGGGCCAGCCAGTGGGGTATTTCAT^ 1979 



CYP52A1A 
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CYP52A2B 
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CYP52A8A 
CYP52A8B 
CYP52D4A 



2465 CCCACCGTTTGGAAGAATACTACGGTAAGGACGCTAACGACTTCAGACCAGAAAGATGGTTTGAACCATC 2534 
2481 CCCACAGAAACCCAGCTGTTTACGGTAAGGACGCTCTTGAGTTTAGACCAGAGAGATGGTTTGAGCC^ 2550 
2354 CCCaCAGAAACCCAGCTGTCTACGGTAAGGACGCCCTTGAGTTTAGACCAGAGAGGTGGTTTGArc 2423 
2462 CCa«a«3AGACaaVAGTATCTACGGTGCCGACGCTGACGTCCT 2531 
2249 CCCACAGAGACCCAAGTATCTACGGTGCCGACGCCGACGTCTTCAGACCAGAA;^ 2318 
2370 CTCATTTGGACCXTGTCTATTACGGCCCTOATGCTGCTGAGTTCAGACCAGAGAGATGG^ 2439 
2409 CCCACTTAGATCCTGTCTATTATGGCCCTGATGCTGCTGAGTTCAGACCAGAGAGATGGT^^ 2478 
1716 CTGAGACAAATCCTGCTTATTATGGCGCCGATGCTGCTGATTTT 1785 
2269 CTC^GRCAAATCCTGCTTATTATGGCGCCGATGCTGCTGATTTTAGACCGGAAAGAT^^ 2338 
1980 CACACTTGAATGAGAAGGTATATGGGAATGATAGCCATGTGTTTCGACCGGaGAGATGGGCTGCGTTAGA 2049 



CYP52A2A 2551 GACAAAGAAGCTTGGCTGGGCCTTCCTCCCATTCAACGGTGGTCCAAGAATC^ 2620 

CYP52A2B 2424 GACAAAGAAGCTTGGCTGGGCCTTCCTTCCATTCAACGGTGGTCCAAGAATTTGCTTGGGACAGC^^ 2493 

CYP52A3A 2532 AACTAGAAAGTTGGGCTGGGCATACGTTCCATTCAATGGTGGTCCAAGAATCTGTTTGGGTCAAC^ 2601 

CYP52A3B 2319 AACTAGAAAGTTGGGCTGGGCATAICTTCCATTCAATGGTGGTCCAAGAATCTGTTTGGGT^ 2388 

CYP52A5A 2440 AACCAAAAAGCTCGGCTGGGCTTACTTGCCATTCAACGGTGGTCCAAGAATCTGTTTGGGTCAGCAG^ 2509 

CYP52A5B 2479 AACCAGAAAGCTCGGCTGGGCTTACTTGCO^TTCAACGGTGGGCCACGAATCTGTTTGGGTCAGCAGT^ 2548 

CYP52A8A 1786 AACTAGAAACTTGGGATGGGCTTTCTTGCCATTCAACGGTGGTCCAAGAATCTGTTT^ 1855 

CYP52A8B 2339 AACTAGAAACTTGGGATGGGCTTACTTGCCATTCy^CGGTGGTCCAAGAATCTGCT 2408 

CYP52D4A 2050 GGGCAAGAGTTTGGGCTGGTCGTATCTTCCATTCAACGGCGGCCCGAGAAGCTGCCTTGGTCAG^ 2119 



* «>«■**«*** *•* 



* *«■ *-* 



CYP52A1A 2605 GCCTTGACTGAAGCTTCTTATGTGATCACTAGATTGGCCCAGATGTTTGAAACTGTCTO^TCT 2674 

CYP52A2A 2621 GCCTTGACAGAAGCTTCGTATGTCACTGTCAGGTTGCTCCAGGAGTTTGCAa^CTTGTCTATGGACCC^ 2690 

CYP52A2B 2494 GCCTTGACAGAAGCTTCGTATGTCACTGTCAGATTGCTCCAAGAGTTTGGA^ 2563 

CYP52A3A 2602 GCCTTGACCGAAGCTTCATACGTCACTGTCAGATTGCTCCAGGAGTTTGCACACTTGTCTATGGACCC^ 2671 

CYP52A3B 2389 GCCTTGACTGAAGCTTCATACGTCACTGTCAGATTGCTCCAAGAGTTTGGAAACTTGTCCCTGGATCX7UV 2458 

CYP52A5A 2510 GCCTTGACGGAAGCTGGCTATGTGTTGGTTAGATTGGTGCAAGAGTTCTCCCACGTTAGGCTGGACCCAG 2579 

CYP52A5B 2549 GCCTTGACCGAAGCTGGTTACGTTTTGGTCAGATTGGTGCAAGAGTTCTCCCyvCATTAGGCTGC^ 2618 

CYP52A8A 1856 GCPTTGACTGAAGCCGGTTACGTTTTGGTTAGACTTGTTCAGGAGTTTCCJ^CT 1925 

CYP52A8B 2409 GCTTTGACCGAAGCCGGTTACGTTTTGGTTA6ACTTGTTCAGGAATTCCCTAGCTTGTCACAGGACCCCG 2478 

CYP52D4A 2120 GCAATCCTTGAAGCTTCGTATGTTTTGGCTCGATTGACACAGTGCTACACGAC6ATACAGCTTAG AA 2186 
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2675 GTCTCGAATACCCrKXavCCAAAG^ 2744 

' 2691 ACACCGAATATCCAOTAAGAAAATGTCGCATTTGACCATGTCGCT^^ 2760 

2564 ACACaSAATATCCAOnTVGGAAAATGTCGCATTl^CC^ 2633 

2672 ACACCGAATATCCACCAAAATTGCAGAACACCTTGACCTTGTC 2741 

2459 ACGCTGAGTACCCACCAAAATTGCAGAACACCTTGAC^^ 2528 

2580 ACGAGGTGTACCCGCCAAAGAGGTTGACCAACTTGACCATGT^^ 2649 

2619 ATGAAGTGTAIXXACCAAAGAGGTTGACCAACTTGACCATGTGTTTGC^^ 2688 

1926 AAACCAAGTACCCACCACCTAGATTGGOUyvCITGACGATGTGOT 1995 

2479 AAACTGAGTACCCACCACCTAGATTGGCACACTTGACGATGTGC^^ 2548 

2187 CTiWXGAGTACCCAtXAAAGAAACTCGTTCATCTCACGAT^ 2256 



CyP52AlA 
CyP52A2A 
CYP52A2B 
CyP52A3A 
CyP52A3B 
CyP52A5A 
CyP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



2745 
2761 
2634 
2742 
2529 
2650 
2689 
1996 
2549 
2257 



GTAA-AGTAGTCGATGCTGGCTATTCGATTACArGT--GTATAGGAAGATTTT^^ 2811 

GTATTAGAGGGTCATGTGTTATTTT-GATTGTTTA GTTTGTAATTACTGATTAGGTTAATTCATG 2824 

GTATTAGAGGATCATGTGTTATTTTTGATTGCnTTAGTCTGTCT 2703 
GTACTAAGGTTGCTTTTarrTGCTAATTTTCTT^ 2811 

GTTCTAAGGTTGCTTATCCTTGCTAGTGTTATT TATAGTTTGTGTATTTAAATTGAATCGGCGATTG 2595 

TGACTAGCGGCGTGGTGAATGCCTOTGATTTTGTA GTTTCTGTTTGCAGTAATGAGATAACTATTCA 2716 

TGACTAgTA- CGTA-TGAGTGCGTTTGATTTTGT A GTTTCTGTTTGCAGTAATGAGATAACTATTCA 2753 

GTCATAGCTTTCCC CATACAA6TAGTTCAGTA ^ATTATACACTGTTTTTACTTTCTCTTCATACC 2059 

GCAATA6GTTT TGGTTTGACTTTGTTTCCATA— 2580 

TAGWWOTGATTATGTGTTTATGGTTAATCGGGGauUVGCACTGOJkGT^ 2326 



CYP52A1A 
CyP52A2A 
CYP52A2B 
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2812 
2825 
2704 
2812 
2596 
2717 
2754 
2060 
2581 
2327 



TTTTTTTAATTTTTGTTAAATTAG-TTTAGAGATTTCATTAATAaiT 2880 

GATTGTTATTTATTGATAGGGGTT TGCGCGTGTTGCATTCACTTGGGATCGTTCCAGGTTG 2885 

GATrGTTATTTATTGATAGGGGGTGCGTGTGTGTGTGTGTGTTGCATTO^ 2773 

ATTTTT CTGATACCAATAACCGTA GTGCGATTTGACCAAAACCGTTCAAACTTTTTGTTCTC 2873 

ATTTTTCTGGTACTAATAACTGTA GTGGGTTTTGACaWACrGTTCAAACr mrL ' in y m 2657 

GATAAGGCGAGTGGATGTACGTTT-TGTAAGAGTTT — CCT-TACAACCTTGGTGGGG-TGTGTGAGGTT 2781 

GATAAGGCGGGTGGATCTACGTTT-TGT 2817 

AAATGGACAAAAbXi-i-rAAGCATG-CCTAACAACGTGACCG-GACAATTGTGTCGCACT 2127 

TGCAAGT 2587 

AGCATTGGTGTTCCGGAGCATCAATAACCAATGTCTTGAAGGGTTTGATTTTCT^ 2396 



CYP52A1A 
CyP52A2A 
Cyp52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52ASB 
CYPS2A8A 
CYP52A8B 
CYPS2D4A 



2881 
2886 
2774 
2874 
2658 
2782 
2818 
2128 
2588 
2397 



*"^ACTTCTATCC~~CCTGTATCCCTTATTATCCC^ 2948 
ATCTTTCCTOCATCCT--GTCGAGTCAAAAGGAGT1^ 2953 
TTGTTTCCTTCCATCCT--GTTGAGTCAAAAGGAGTTTTGTTTTGTAACTCCGGACG 2841 

TCGTT6A CG TGCTCGCTCATCAGCACTGTTTGAAGACGAAAGA-GAAAATTTTTTGTA 2930 

TilirCi-iCCCCCTACXTTCGTTGCTCGCTCATaVGCACTGTTTGAAAACGAAA^ 2727 
GAGGTTGCATCTT-GGGGAGATTACACCTTTTG-CAGCTCrCCGTATACA^ 2849 

G CATCTTAG-GGAGAGATAGCACCTTTTG-CAGCTCTCCGTATACAGTTTT^ 2B81 

ATTGTAAAAATAG-TGTAOWn'AATTTGTGGTGGCCGGAGATAAATTACAGTTTGGTT^ 2196 

ASTTCA GTAA T TACACACTAATTTGTGGTGGCCGGCGATAAATTACCGTTTGGTTTTGTGTAAAAAT 2654 

GAGCTTCTTTCCG--TCAAACTTGTACAGAAT6GCCATCATTTCAGGAAC^ 2463 
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2949 CACAAACTCCCTAACGGACITAAACCATAAAC^ 3OI8 

2954 AAQGTaaTCTCCATGTGATTGTTTTGACTGTTACTGTGATT^ GACGTTATA 3016 

2842 AAQGTOGATCTCCATGTGATTGCTT-GACTGCTACTCTGATTATGTAATCT^ 2910 

2931 AACAACACTGTCCJUUITTTAC(XAAC6TGAACCATTATG--CAAATGA^ CTTTCAA 2989 

2728 /UlCAACATT6CCCAAACTTACCaWkCGT6AACCATTATAACaUUVTGAGCGGC(^ CTTTCAA 2788 

2850 TATCAATCATGTGGGGGGGGGGGTTCATTGTTTGGCKaTGGTGGTGCATGTTAAATCCGCC-AACTA^ 2917 

2882 TGCCAATCATCTGG GGATTCATTGlTTGCCKaTGGTGGTGCATGauUVATCCCCCauv^ 2944 

2197 GCGGATATCTCTGGC AGTWTCTTCTCCGC-AGa^GCTTTGCC^^ 2260 

2655 TCGGACATCTCTGGT GGTTTCCCTTCTCCGC-AGCAGCTTTGCCACGGGTTTGCTCTGCGGCC^ 2718 

2464 TACCGCATCTGGAGTA TCTCGCCGTCGTTCAAGTAG--CACGAAAACAGCAACGACGTCACCATCTG 2528 
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CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
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3019 «^TTCTCCAACCAATAGCATGGACAGACCCACXCTCCTAT^^ 3088 

3017 CAAGCATGTGATTGTGGTTTT GCAGCCT-TTTGOMIGACAAATGATCGTCaGawrGAW^ 3079 

2911 CAAGCATGT6ATTCTGGTTTTT -GCAACCTGTTTGCACGAOUUITGATCGACAGTCGATTACGTAA 2975 

2990 CTGCTCGCTGGaWfiCATTCGGG GATATCTACAACGCCCTTAAGTTTGAAACAGACATTGArrTAG 30 = 4 

2789 CT6GTCACTGGAGGCASTCGGG -GAZATCTACJUU»C0CTTAAGTTTGAG6AAGACAITGATn 2853 

2918 CAATCTCACATGAAACTC»AGCACACTAAAAAAAAAAAAGATGTTGGGGG3^^ 2986 

2945 CAATCnx:ACATGAAACTCAAfiCACACTAGAAAAAAAA--GATGTTGC^^ 3005 

2261 CAAATTCAAAAGGGGG AG3UUlCTTAACACCCCTTATCTCTCaM:TC-T^ 2318 

2''19 CAAMnrCGAAAGGGGOGGGGGGGGGGGAGayU^^ 2785 
2529 CTTCCCAATCTTGACACCC ACAGATACCCCTGCGGCTTCATGGATCAAAAACGTCGGCAACC 2590 



CYP52A1A 3089 CCACCTGGAAGCCCCTCAAGCCAOICACGTCATCCAGCCCACCC^^ 3158 

CYP52A2A 3080 TCTTTGTTA GAGGGGTAAAAAAAAACAAAATGGCA{^CAGAATTTCAAACATTCT(K3UU^C^ 3144 

CYP52A2B 2976 TCCATATTAT-TTAGAGGGGTAATAAAAAATAAA-TGGCAGCCAGAATTTCAAACATTTTGC^ 3043 

CYP52A3A 3055 ACACCATAGA-TTTCAGCGGCATCAAGAATGACC TTGCCCACATTTTGACGACCCCAACACCACTG 3119 

CYP52A3B 2854 ACACCATAGA-TTTCAGCGGCATCAAGAATGACC TTGTCCACATTTTGACAACCCCAACACCACTG 2918 

CYP52A5A 2987 TTAGTAArr--AAACACT<nXyiCTCTCACTCTCACTCTCTCC^^ 3054 

CYP52ASB 3006 TTGGGGAAA--ACTTTCGTTTCCTTTCTCAGTAATTAAACGTTC^ 3073 

CYP52A8A 2319 CTTGTGGGG--ATGCAATTGTCGTACGTTTTTTATGTTTTGTCTAGACT 2386 

CYP52A8B 2786 CTTGTGGGGGGATGTAATTGTCGTACGTTTTC-ATGTTTGGCCCAGACT^^ 2854 

CYP52D4A 2591 CX:GCGTATATGTCCATGTAATTCTCCATGGCXACXT--CCflTCA^ 2658 



CYP52A1A 3159 GTCCaUU^GACGGCGAGTTCTGGTGTGCCCGGAAATCAGCCATCCCGGCCACATACAAGC^ 3228 

CYP52A2A 3145 CAAAAAATGGGAAACTC — CaACAGACAAAA-TU^AAAAACTCCGCAGCACTCCGAACCCACAGAACAATG 3211 

CYP52A2B 3044 CAAAAGATGAGAAACTC — CAACAGAAAAAATAAAAAAACTCCGCAGCACTCCGAACCAACAAAACAATG 3111 

CYP52A3A 3120 GAAGAATCACGCCAGA ^AACTAGGO^TGGATCCAAGCCTGTGACCTTGCCCAATGGAGACGAAGTG 3185 

CYP52A3B 2919 GAAGAATCGCGCCAGA AACTAGGCGATGGATCOAGCCTGTGGCXnTGCCCAATGGAGACGAAGTG 2984 

CYP52A5A 3055 AGACAACCAGAWUUUUUVAGAACAAAATCCAGATAGAAAAACAAAGGGCT- 3122 

CYP52A5B 3074 AGACAACXaCftAAAAA CAAAATCCaGATAGAAGAAGAAAGGGCT-GGACAACCATAAAT-AAAC 3135 

CYP52A8A 2387 TTATGTCTGAGGCGTG -CTTGAAAGAAGTGTCAAAATGTGACAGGCG-ACGCTATTCGJ«yVT-GAAC 2450 

CYP52A8B 2855 TTATGTCTAAGGCGTG CTTGACACAAGTGTGAAAAGGTGACAGGCG-ACGTTATTCGACAT-GAAC 2918 

CYP52D4A 2659 CCACCACTGCCCTCGG TT6AGTCAAGGCAGTATGATGCCGGGATCCAGTACTCCAATGGGAACC 2722 



CYP52A1A 3229 GCGTGCATACTCGGCGAGCCCACAATGGGAGCCACGCATTCGGACCATGAAGCAAAGTACATTCAC^^ 3298 

CYP52A2A 3212 GGG CGCCAGAATTATTGACTATTGTGACTTTTTTA CGCTAACGCTCATTGCAGTG 3266 

CYP52A2B 3112 GGGG~-GCGCCAGAATTATTGACTATTGTGACTTTTTTTT--ATT^^ 3177 

CYP52A3A 3186 GAGTTGAACCy^AGCGTTCCTAGAAGTTACCACATTATTGTCGAATGAGTTTGACTTGG^ 3255 

CYP527i3B 2985 GAGTTGAACCAAGCGTTCCTAGAAGTTACCyiCATTATTGTCGaACGAGTTT^ 3054 

CYP52A5A 3123 AATCrTAGGGTCTACTCCATCTTCCACrrGTTTCTTCTTCTTCAGACTTAGCT-AAC^ 3191 

CYP52A5B 3136 AACCTAGGGTCCACTCCATCTTTCACT TCTTCTTCTTCAGACTTATCT-AAiCAAACGACTCaCTTCA 3201 

CYP52A8A 2451 GC6AAAGGGTTATTTGCATCAATACGAG--GGGCTGACTCTAGTCTAGG ATGGCAGTCCTAGGTTGC 2515 

CYP52A8B 2919 GCAAAAGGGTAATTTGCATCGATACGAG— GGGTTGCCTCTGGTCTAAG AAGGACCCCCCAGGTTGC 2983 

CYP52D4A 2723 TCT GCACGGTGTCGCTGCAGTTTTTGAGGCGTATTTCGA TCCATGATCGTTCTTTGG 2779 



CyP52AlA 3299 TCACGGGTCTTTCAG-TGTCGCAGATTGAGAAGTTCGACGATGGATGGAAGTACGATCTCGT^^ 3367 

CYP52A2A 3267 TAGTGCGTCTTACACGG GGTATTGCTTTCTACAATGCAAGGGCA-CAGTTGAAGGTTTGCACC 3328 

CYP52A2B 3178 AAGTGTGTTACACGGGGTGGTGATGGTGTTGGTTTCTACAATGOUVGGGCA-CAGTTGAAGGTTTCC^ 3246 

CYP52A3A 3256 CGGCAGAGTTGTTATACTA-CGCTGGCGACATATCCTACAAGAAGGGCACATCAATCGCAGACAGTGCCA 3324 

C:YP52A3B 3055 CGGCCGAGTTGTTATACTA-CGCCGGCGACATATCCTACAAGAAGGGCACATCAATTGCCGACAGTGCCA 3123 

CYP52A5A 3192 CCATGGATTACGCAGGCATCACGCGTGGCTCCATCAGAGGK:GAGGCCTTGAAGAAACTCG — CAGAATT 3258 

CYP52A5B 3202 CCATGGATTACGCAGGTATCACGCGTGGGTC(y^TCAGAGG-CGAAGCCTTGAAGAAACTCG--CCGAGTT 3268 

CYP52A8A 2516 AAACATGTTGCACCA-TATCCCTCCTGGAGTTGGTCGAC — CTCGCCTACGCC-ACCCTCA— GCGATCG 2579 

CYP52A8B 2984 AAACATGTTGCACTG-CATCCaiCTCAGAGTTGGTCGAC--CACGCCTACGCTTACCCTCA — GCGATCG 3048 

CYP5204A 2780 T6CTGTAGTATAACGA6CT — CTTGGTGTCCTTGAAATGGAACAGGTTGGATGTGTTGTTGAGTTTGTCT 2847 
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CnfP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



3368 
3329 
3247 
3325 
3124 
3259 
3269 
2580 
3049 
2848 



G~AGTAACTTCCTAAGCTCGAATTATGC 

TAACGTTGCACCATATCAACTCAATTTATC CTCATTCAT6TGATAAAAGAAGAGCCAAA 

GATTGTCTTATTATTTGAGAGCAAACTAC AKrTTGAACATACTTGGGTATTTGAT 

GATICTCTTACTATTTGAGAGCAAACTAC ^AICTTGAACATACTTGGGTACTTTAT 

G-ACCATCCAGAACCAGCCATCCAGCT TGAAAGAAATCAACACC66CATCCA6AAGGACGACTT 

G-ACCATCCAGAACCAGCCATCCAGCT TGAAAGAAATCAACACCGGCaVTCCAGAAGGACGACTT 

Ga^TTTCCGTTGTTCAATATTTCTC CTTCCCATTGTTCCAfiGGGTTA— TC 

GCACTTTCCGTTGCTCAATATTTCTCT CCCCCCTGCTTCC0CCCATTGTTCCAG6GATTA TC 

GCGTGCTTGGmGCAAGTCTTCGATCG AGCGTAGTGAGTAGACAGTTGGCGGG 



3437 
3385 
3305 
3375 
3178 
3321 
3331 
2629 
3110 
2901 



CYP52A1A 
CyP52A2A 
CyP52A2B 
CyP52A3A 
CyP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



3438 
3386 
3306 
3380 
3179 
3322 
3332 
2630 
3111 
2902 



TGCGTACGTCATGAGTGTGCCTTTTGATGGACCCAAGGAGGAAGGTTACGTGGITGGGACGTACAGATC^ 

AGCT-CGTGCGTCAAC(rrATGTGCAGGAAAGAAAAAATCCAAAAA--AATCGAAA-ATGCGACTl^ 

AGGT-AAT-TGGCAGACCCCCCAAGGGGAACACGGAGTAGAAAGC— AATGGAAACACGCCCATGACAGT 

TTCG-AAGCAGCGATTGGATTTGATAGTCACGGACAACGACGCGT--TGTTTGATA6TATTT^ 

TTCG-AAGCAGCGATTGGATGTGATAGTCACCGACAACAAaM:GT--TGTTTGATAATATa^ 

TGCC-AAGTTGTTGTCrGCCaCCCCGAAAATCCCCACCAAGCACA--AGTTGAACGGCAACCACGAATO 

TGCC-AAGTTGTTGTCTTCCACCCCGAAAATCCACACXaAGaVCA-~AGTTGAATGGCAACCA^ 

AACA-ACGTTGCCGGCCTCCTC CCCAAATTA CAAGAAAAATAAATT- 

AACA-ACGTTGCCGGTCTCCTCTCCCCCCCCTCCCCCCAGTTAT -GTACAAGAAAATTAAATT- 

GKrGGTGGCTCGGGCTTTATTCTGTGTTTGTGTTT^^ 



3507 
3451 
3371 
3446 
3245 
3387 
3397 
2674 
3171 
2969 



CYP52A1A 3508 ATTGAAAGGTTGAGCTGGGGTAAAGACGGGGACGTGGA-GTGGACCATGG CGACGACGTCGGATCCT 3573 

CYP52A2A 3452 TTTGAATAAACCAAAAAGRAAAATGTCGCACTTTTTTC TCGCTCTCGCTCTCTCGACCCAAATCA 3516 

CYP52A2B 3372 GCCATTTA6CCCACA ^ACACATCTAGTATTCTTTTT TTTTTTTGTGCGCAGGTGCACACCTGG 3433 

CYP52A3A 3447 TTTGAAAAGATCTAC AAGTTGATAAGCGTGTTGA ^ACGATATGATTGACAAGCAAAAGGTGA 3507 

CYP52A3B 3246 TTTGAAAAGATCTAC AAGTTGATAAGCGCGTTGA ACGATATGATTGACAAGCAAAAGGTGA 3306 

CyP52A5A 3388 GTCTGAGGTCGCCATT6CCAAAAAGGAGTACGAGGTGTTGATTGCCTTGAGCGACGCCACAAAAGACCCA 3457 

CyP52A5B 3398 GTCCGMU3TCGOCATTGCCAAAAAGGAGTACG3U»3TGTTGATTGCCTTGAGCGACGC^ 3467 

CYP52ABA 2675 GTC6CACGGCACCGATCTGTCAAAGATACAGATAA ^ACCTTAAATCTGCAAAAACAAGACCCC 2736 

CYP52A8B 3172 GTCGCACGGCACCGATACGTCAAAGATACAGAGAA ACCTTAA TCC 3216 

CYP52D4A 2970 GGTTCGTAGTATAAGTAGCGCCAATATGAGAATGTATA TCCGCATCACCCAAGACTCTTCAGCCT 3034 



CYP52A1A 
CyP52A2A 
CYP52A2B 
CyP52A3A 
CYP52A3B 
CYP52A5A 
CyP52A5B 
CyP52A8A 
CyP52A8B 
CYP52D4A 



3574 GGTGGGTTTATCCC6CA-ATGGATAACTCGATTGAGCA-TCCCTGGAGCAATCGCAAAAGATGTGCCTAG 3641 
3517 CAACAAATCCTCGCGCGCAGTATTTCGACGAAAC--CACAACAAATAAAAAAAACAAATTCTACACCACT 3584 
3434 ACTTTAGTTATTGCCC-CATAAAGTTAACAATCT--CACCTTTGGCaCTCCCAGTGTCTCCGCCT 3500 
3508 CAAGCGACATCAACAGTCTAGCATTCATCAATTG~--CATCAACTACTCGaGAGGTCAACTATTCT^ 3575 
3307 CAAGCGACATCAACAGTCTAGCATTTATCAACTG—CATCAACTACTCGAGGGGTCAACTATTCTCCGCA 3374 

3458 ATCAAAGTGACCTCCCAGATCAAGATCTTGATTGACAAGTTCAAGGTGTACTTGT TTGAGTTGCCT6 3524 

3468 ATCAAAGTCACCTCCCAGATCAAGATCTTGATTGACAAGTTCAAG6T6TACTTGT TTGAGTTGCCCG 3534 

2737 TCCCCATAGCCTAGAAGCACCAGCAAGATGATGGAGCAACTCCTCCAGTACTGGTACATCGCACTCTCTG 2806 
3217 CTCCCATAGCCTAGAAGCATCAAAAAGATGATTGAGCAACTCCTCCAGTACTGGTACATTGCACTCOCTG 3286 
3035 GTTACAACGACTGAGGCTGTTGGCCGTGTGACCAATTGGTTTCTTTGGTGACCTAGATTGGTCCCGCAGG 3104 



CYP52A1A 3642 TG TATTAAACTACATACAGAAATAAAAACGTGTCTTGATTCATTGGTTT GGTTCTTGTTGGGTT 3705 

CYP52A2A 3585 T CTTTTTCTTCACCAGTCAACTUVAAAACAACAAATTATACACCATTTCAACGATTTTTGCT^ 3650 

CyP52A2B 3501 TG CTCGTTTTACACCCTCGAGCTAACGACAACACAACACCCATGAGGGGAATGGGCAAAGTT 3562 

CYP52A3A 3576 OA CGAACTTTTGGG-ACTGGTTTTGTTTGGATTGGTCGACATCTATTTCAACCAGTTTGGCACATTA 3641 

CyP52A3B 3375 OA CGAACTTTTGGG-ACTGGTTTTGTTTGGATTGGTTGACAACTATTTCAACCAGTTTGGCTC^^ 3440 

CYP52A5A 3525 AC CAGAAGTTCTCCTACTCCATCGTGTCCAACTCCGTCAACATCGCCCCC-TGGACCTTGCTCGGGG 3590 

CYP52A5B 3535 AC CAGAAGTTCTCCTACTCCATCGTGTCCAACTCCGTTAACATTGCCCCC-TGGACCTTGCTCGGTG 3600 

CYP52A8A 2807 TA TGGTTCATCCTTCGCTACTTGGCTTCCCACGCACGAGCCGTCTACTTG-CGCCACAAGCTCGGCG 2872 

CYP52A8B 3287 TA TGGTTCATTCTCCGCTACGTGGCTTCCCACGCACGAACCATCTACTTG-CGCCACAAGCTCGGCG 3352 

CYP52D4A 3105 GAAAGCAAGGGCTGCTAGGGGGGCATACCAAACAAGGTCGTGTAATCAGTATCTATGGTGCTACCATGTG 3174 
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CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CyP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



3706 CCGAGCCAATATTTCACATCATCTCCTAAATTCTCC^ 3775 

3651 AAATG CTATA TAATGGTTTitflTTC3lACTCAGGTATGTTTAT-T^ 3717 

3563 AAACACTTTTGGTTTCAATGATTCCTATTTGCTACTCTC^ 3630 

3642 GACAACTACAAGAiVGGTArrGGCATTGATACTGAAGAACATCaGCGATGAAGACATC^^ 3709 

3441 GACAACTACAAGAAAGTATTGGCftTTGATACn^AAGAACATCA^ 3508 

3601 AGAAGTTGACCACGGGCrrTGATCAACnTGGCGTTC(»^^ 3669 

2873 CGGCGCCATTOVCGCACACCCAGTACGACGGCTGGTATGGGTTCAAGTTTG 2940 

3353 CGGCGCCGTTCACGCACACCCAGTACGACGGATGGTATGGGTTCAAGT^ 3420 

3175 TGTGGTTGGGGGGAAATTCCCGCATTTTTGTGTAACXAAAGTTCTAGAAA 3243 



CYP52A1A 3776 CTGAGATCTTATTTAATATCGACTTCTCAACCACCGGTGGAATC--CCGTTC^^ 3843 

CYP52A2A 3718 OUUVTACTAACTACTTTTGATGrrrGTCGCTTTTCTAGAAT^ 3787 

CYP52A2B 3631 AAATAAACGACAATTATATATACCTTT TCGTCTGTCCTC CAATGTCT-CTTTTTGCTGCCATT 3692 

CYP52A3A 3710 CTTCCTCCCATCGACACTACAATTGTTTAAGCTGGTGTTGGACAA-GAAA^^ 3778 

CYP52A3B 3509 CTTCCTCCCATCGACACTACAATTGTTTAAGCTGGTGTTGGATAA--^^ 3577 

CYP52A5A 3660 ACATCTTCAACGAGTTCATCGACAA6TTCTTTGGCAACACGGAG — CCGCAATTGAC CAACTTCT 3722 

CYP52A5B 3670 ACATCTTCAACGAGTTCATCGACAAGTTCrTTGGCAACACAGAG — CCGCAATTGAC CAACTTCT 3732 

CYP52A8A 2941 GCGAAGAAGATCGGGCGGCAGACGGACTTGGTGCATGCGCGGTT — CCGTGGCGG CATGGACA 3001 

CYP52A8B 3421 GCGAAGAAGATTGGAAGGCAGACGGACTTGGTGCATGCGCGGTT— CCGTGGAGGGGG CATGGATA 3484 

CYP52D4A 3244 ATCTGCTGGAACCATCCACCCGCATTTCCGTTGCCAAA6TGGGAA-GAGCAATCAACCCACCCTGCTTTG 3312 



CyP52AlA 3844 GTGTGTTTGCTCTTGTTCTTGATGACAATGATGTATTTGTCACGATACCTGAAATAATAAAACATCCAGT 3913 

CYP52A2A 3788 GTCGAATAGACGGTTTGTTTACTCATTAGATGGTCCCAGATTACTTTTCAAGCCAAAGTCTCT-CGAGTT 3856 

CYP52A2B 3693 TTGCTTTTTGCTTTTTGCTTTTGCACT CTCTCCCACTCCCACAATCA6TGCAGCAACACA-CAA 3755 

CYP52A3A 3779 GTTCTnCAAGTACATCACTTCAflCAGT— GTCACGAGACTACAACTCCAACATCGGCTCCACAGCCAAAG 3846 

CYP52A3B 3578 GTTCTACaAGTACATCACCTCAACAGT--GTC6CAAGACTACAACTCCAACATCGGAGCCACAGCCAAAG 3645 

CYP52A5A 3723 TGACCTT6TGCGGTGTGTTGGACGGGTTGATTGACCAT6CC-AACTTCTT6AGCGTGTCCTCGCGGACCT 3791 

CYP52A5B 3733 TGACCTTGTCCGGTGTGTTGGACGGGTTGATTGACCATGCC-AACTTCTTGAGCGTGTCCTCCAGGACCT 3801 

CYP52A8A 3002 CCTTCTCGAGCTACACTTTCGGCATCCATATCATCCTTACC-CGGGACCCGGAGAACATCAAGGCGGT^ 3070 

CYP52A8B 3485 CTTTCTCGAGCTATACTTTCGGCATCCATATCATTCTTACT-CGGGACCCGGAGAACATCAAGGCGGTCT 3553 

CYP52D4A 3313 CCCAATCAGCCATTCCCCTGGGAATATAAATTCAAC 3348 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



3914 CATTGAGCTTATTACTCGTGAACTTATGAAAGAACTCATTCAAGCCGTTCCCAAAAAACCCA6AATTGAA 3983 
3857 TTGTTTGCTGTTTCCCCAATTCCTAACTATGAAGGGTTTTTATAAGGTCCAAAGACCCCAAGGCATAGTT 3926 
3756 3755 

3847 ATGATATCGATTTGTCCAAAACCAAACTCAGTGGCTTTGAGGTGTTGACGAGTT 3900 
3646 ATGATATCGATTTGTCCAAAGCC 3668 
3792 TCAA6ATCTTCTTGAACTTGGACTCGTATGTGGAC 3826 
3802 TCAAGATCTTCTTGAACTTGGACTCGTTTGTGGACAACTCGGACTTCTTGAACGACGTGGAGAACTACTC 3871 
3071 TGGCGACGCAGTTCGATGACTTCTCGCTCGGTGGCAGGATCAGGTTCTTGAAGCCGTTGTTGGGGTATGG 3140 
3554 TG6C6ACGCA6TTCGATGACTTTTCG 3579 
3349 3348 



CYP52A1A 3984 GATCTTGCTCAACTGGTCATGCAAGTAGTAGATCGCCATGATCTGATACTTTACCAAGCTATCCTCTCCA 4053 

CYP52A2A 3927 TTTTTGGTTCCTTCTTGTCGTG 3948 

CYP52A2B 3756 3755 

CYP52A3A 3901 39OQ 

CyP52A3B 3669 ^669 

CyP52A5A 3827 3Q2g 

CYP52A5B 3872 CGACTTTTTGTACGACGAGCCGAAC6AGTACCAGAACTT 3910 

CYP52ABA 3141 GATATTCACGTT 3152 

CYP52A8B 3580 3579 

CyP52D4A 3349 334Q 
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CyP52AlA 4054 AGTTCTCOCWOTACMGCAAGTACGGCAACGAGCTCTGGAAGCTTT^^ 4115 

C:YP52A2A 3949 3948 

CYP52A2B 3756 3755 

CYP52A3A 3901 3900 

CYP52A3B 3669 3553 

arP52A5A 3827 3826 

CYP52A5B 3911 3910 

CYP52A8A 3153 3152 

CyP52A8B 3580 3579 

CYP52D4A 3349 3343 
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CYP52A1A 


1 


MATQE I I DS VLPYL 


TKWYTVITAAVLVFLISTNIKNYV 


38 


CYP52A2A 


1 








CYP52A2B 


1 


MTAQDIIATY 


ITKWYVIVPLALIAYRVLDYFYGRY 


35 


CYP52A3A 


1 


MSSSPSFAQEVLATTSPYIBYFLDNYTRWYYFIPLVLLSLNFISLLHTRY 


50 


CYP52A3B 


1 


MSSSPSFAQEVLATTSPYIEYFLDNYTRWYYFIPLVLLSLNFISLLHTKY 


50 


CYP52A5A 


1 






30 


CYP52A5B 


1 






30 


CYP52A8A 


1 






30 


CYP52A8B 


1 


MLDQIFHY 


WYI VLPLLVI IKQI VAHARTNY 


30 


CYP52D4A 


1 






32 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



39 KAKKLKCVDPPYLKDAGLTGILSLIAAIKAKNDGRLANFAD EVFDEY 85 

36 LMYKLGAKPFFQKQTDGCFGFKAPLELLKKKSDGTLIDFTL QRIHDL 82 

36 LMYKLGAKPFFQKQTDGYFGFKAPLELLKKKSDGTLIDFTL ERIQAL 82 

51 LERRFHAKPLGNFVRDPTFGIATPLLLIYLKSKGTVMKFAWGLWNNKYIV 100 

51 LERRFH/VKPLGNWLDPTFGIATPLILIYLKSKGTVMKFAWSFWNNKYIV 100 

31 LMKKLGAAPVTNKLYDNAFGIVNGWKALQFKKEGRAQEYND YKFDHS 77 

31 LMKQLGAAPITNQLYDNVFGIVNGWKALQFKKEGRAQEYND HKFDSS 77 

31 LMKKLGAKPFTHVQRDGWLGFKFGREFLKAKSAGRLVDLII SRFHDN 77 

31 LMKKLGAKPFTHVQLDGWFGFKFGREFLKAKSAGRQVDLII SRFHDN 77 

33 LMHKHGAREIENVINDGFFGFRLPLLLMRASNEGRLIEFSV KRFESA 79 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



86 PN'-HTFYLSVAGALKIVMTVDPENIKAVLATQFTDFSLGTRHAHFAPLL 133 
83 DRPDIPTFTFPVFSINLVNTLEPENIKAILATQFNDFSLGTRHSHFAPLL 132 
83 NRPDIPTFTFPIFSINLISTLEPENIKAILATQFNDFSLGTRHSHFAPLL 132 
101 RDPKYKTTGLRIVGLPLIETMDPENIKAVLATQFNDFSLGTRHDFLYSLL 150 
101 KDPKYKTTGLRIVGLPLIETIDPENIKAVLATQFNDFSLGTRHDFLYSLL 150 
78 KNPSVGTYVSILFGTRIWTKDPENIKAILATQFGDFSLGKRHTLFKPLL 127 
78 KNPSVGTYVSILFGTKIWTKDPENIKAILATQFGDFSLGKRHALFKPLL 127 

78 ED TFSSYAFGNHWFTRDPENIKALLATQFGDFSLGSRVKFFKPLL 123 

78 ED TFSSYAFGNHWFTRDPENIKALLATQFGDFSLGSRVKFFKPLL 123 

80 PHPQNKTLVNRALSVPVILTKDPVNIKAMLSTQFDDFSLGLRLHQFAPLL 129 



-k-k-k-k k kkfr kkkk-k k 



CYP52A1A 
CYP52A2A 
CYP52A2B 
CYP52A3A 
CYP52A3B 
CYP52A5A 
CYP52A5B 
CYP52A8A 
CYP52A8B 
CYP52D4A 



134 GDGIFTLDGEGWKHSRAMLRPQFARDQIGHVKALEPHIQIMAKQIKLNQG 183 
133 GDGIFTLDGAGWKHSRSMLRPQFAREQISHVKLLEPHVQVFFKHVRKAQG 182 
133 GDGIFTLDGAGWKHSRSMLRPQFAREQISHVKLLEPHMQVFFKHVRKAQG 182 
151 GDGIFTLDGAGWKHSRTMLRPQFAREQVSHVKLLEPHVQVFFKHVRKHRG 200 
151 GDGIFTLDGAGWKHSRTMLRPQFAREQVSHVKLLEPHVQVFFKHVRKHRG 200 
128 GDGIFTLDGEGWKHSRAMLRPQFAREQVAHVTSLEPHFQLLKKHILKHKG 177 
128 GDGIFTLDGEGWKHSRSMLRPQFAREQVAHVTSLEPHFQLLKKHILKHKG 177 
124 GYGIFTLDAEGWKHSRAMLRPQFAREQVAHVTSLEPHFQLLKKHILKHKG 173 
124 GYGIFTLDGEGWKHSRAMLRPQFAREQVAHVTSLEPHFQLLKKHILKHKG 173 
130 GKGIFTLDGPEWKQSRSMLRPQFAKDRVSHILDLEPHFVLLRKHIDGHNG 179 
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CYP52A1A 184 KTFDIQELFFRFTVDTATEFLFGESVHSLYDEKLGIPTP-NEIPGRENFA 232 

CYP52A2A 183 KTFDIQELFFRLTVDSATEFLFGESVESLEU)ESIGMSINALDFDGKAGFA 232 

CYP52A2B 183 KTFDIQELFFRLTVDSATEFLFGESVESLRDESIGMSINALDFDGKAGFA 232 

CYP52A3A 201 QTFDIQELFFRLTVDSATEFLFGESAESLRDESIGLTPTTKDFDGRRDFA 250 

CYP52A3B 201 QTFDIQELFFRLTVDSATEFLFGESAESLRDDSVGLTPTTKDFEGRGDFA 250 

CYP52A5A 178 EYFDIQELFFRFTVDSATEFLFGESVHSLKDESIGINQDDIDFAGRKDFA 227 

CYP52A5B 178 EYFDIQELFFRFTVDSATEFLFGESVHSLKDETIGINQDDIDFAGRKDFA 227 

CYP52A8A 174 EYFDIQELFFRFTVDSATEFLFGESVHSLKDEEIGYDTKDMSEERRR-FA 222 

CYP52A8B 174 EYFDIQELFFRFTVDSATEFLFGESVHSLRDEEIGYDTKDMAEERRK-FA 222 

CYP52D4A 180 DYFDIQELYFRFSMDVATGFLFGESVGSLKDE D ARFL 216 



CYP52A1A 233 AAFNVSQHYLATRSYSQTFYFLTNPKEFRIXNAKVHHLAKYFVNKALNFT 282 

CYP52A2A 233 DAFKfYSQNYLASRAVMQQLYWVLNGKKFKECNAKVHKFADYYVNKALDLT 282 

CYP52A2B 233 DAFNYSQNYLASRAVMQQLYWVLNGKKFKECNAKVHKFADYYVSKALDLT 282 

CYP52A3A 251 DAFNYSQTYQAYRFLLQQMYWILNGSEFRKSIAWHKFADHYVQKALELT 300 

CYP52A3B 251 DAFNYSQTYQAYRFLLQQMYWILNGAEFRKSIAIVHKFADHYVQKALELT 300 

CYP52A5A 228 ESFNKAQEYLAIRTLVQTFYWLVNNKEFRDCTKLVHKFTNYYVQKALDAS 277 

CYP52A5B 228 ESFNKAQEYLSIRILVQTFYWLINNKEFRDCTKLVHKFTNYYVQKALDAT 277 

CYP52A8A 223 DAFNKSQVYVATRVALQNLYWLVNNKEFKECNDIVHKFTNYYVQKALDAT 272 

CYP52A8B 223 DAFNKSQVYLSTRVALQTLYWLVNNKEFKECNDIVHKFTNYYVQKALDAT 272 

CYP52D4A 217 EAFNESQKYLATRATLHELYFLCDGFRFRQYNKWRKFCSQCVHKALDVA 266 



CYP52A1A 283 PEELEEKSKSGYVFLYELVKQTRDPKVLQDQLLNIMVAGRDTTAGLLSFA 332 

CYP52A2A 283 PEQLE-K-QDGYVFLYELVKQTRDKQVLRDQLLNIMVAGRDTTAGLLSFV 330 

CYP52A2B 283 PEQLE-K-QDGYVFLYELVKQTRDRQVLRDQLLNIMVAGRDTTAGLLSFV 330 

CYP52A3A 301 DDDLQ-K-QDGYVFLYELAKQTRDPKVLRDQLLNILVAGRDTTAGLLSFV 348 

CYP52A3B 301 DDDLQ-K-QDGYVFLYELAKQTRDPKVLRDQLLNILVAGRDTTAGLLSFV 348 

CYP52A5A 278 PEELE-K-QSGYVFLYELVKQTRDPNVLRDQSLNILLAGRDTTAGLLSFA 325 

CYP52A5B 278 PEELE-K-QGGYVFLYELVKQTRDPKVLRDQSLNILLAGRDTTAGLLSFA 325 

CYP52A8A 273 PEELE-K-QGGYVFLYELVKQTRDPKVLRDQSLNILLAGRDTTAGLLSFA 320 

CYP52A8B 273 PEELE-K-QGGYVFLYELAKQTKDPNVLRDQSLNILLAGRDTTAGLLSFA 320 

CYP52D4A 267 PEDTS EYVFLRELVKHTRDPWLQDQALNVLLAGRDTTASLLSFA 311 



★ * kit -k-k 
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CYP52A1A 333 LFELARHPEMWSKLREEIEVNFGVGEDSRVEEITFEALKRCEYLKAILNE 382 

CYP52A2A 331 FFELARNPEVTNKLREEIEDKFGLGENASVEDISFESLKSCEYLKAVLNE 380 

CYP52A2B 331 FFELARNPEVTNKLREEIEDKFGLGENARVEDISFESLKSCEYLKAVLNE 380 

CYP52A3A 349 FYELSRNPEVFAKLREEVENRFGLGEEARVEEISFESLKSCEYLBCAVINE 398 

CYP52A3B 349 FYELSRNPEVFAKLREEVENRFGLGEEARVEEISFESLKSCEYLKAVINB 398 

CYP52A5A 326 VFELARHPEIWAKLREEIEQQFGLGEDSRVEEITFESLKRCEYLKAFLNE 375 

CYP52A5B 326 VFELARNPHIWAKLREEIEQQFGLGEDSRVEEITFESLKRCEYLBCAFLNE 375 

CYP52A8A 321 VFELARNPHIWAKLREEIEQQFGLGEDSRVEEITFESLKRCEYLKAVLNE 370 

CYP52A8B 321 VFELARNPHIWAKLREEIESHFGLGEDSRVEEITFESLKRCEYLKAVLNE 370 

CYP52D4A 312 TFELARNDHMWRKLREEVILTMGPSSD EITVAGLKSCRYLKAILNE 357 
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CYP52A1A 383 TLRMYPSVPVNFRTATRDTTLPRGGGANGTDPIYIPKGSTVAYWYKTHR 432 

CYP52A2A 381 TLRLYPSVPQNFRVATKNTTLPRGGGKDGLSPVLVRKGQTVIYGVYAAHR 430 

CYP52A2B 381 TLRLYPSVPQNFRVATKNTTLPRGGGKDGLSPVLVRKGQTVMYGVYAAHR 430 

CYP52A3A 399 TLRLYPSVPHNFRVATRNTTLPRGGGEDGYSPIWKKGQWMYTVIATHR 448 

CYP52A3B 399 ALRLYPSVPHNFRVATRNTTLPRGGGKDGCSPIWKKGQWMYTVIGTHR 448 

CYP52A5A 376 TLRIYPSVPRNFRIATKNTTLPRGGGSDGTSPILIQKGEAVSYGINSTHL 425 

CYP52A5B 376 TLRVYPSVPRNFRIATKNTTLPRGGGPDGTQPILIQKGEGVSYGINSTHL 425 

CYP52A8A 371 TLRLHPSVPRNARFAIKDTTLPRGGGPNGKDPILIRKDEWQYSISATQT 420 

CYP52A8B 371 TLRLHPSVPRNARFAIKDTTLPRGGGPNGKDPILIRKNEWQYSISATQT 420 

CYP52D4A 358 TLRLYPSVPRNARFATRNTTLPRGGGPDGSFPILIRKGQPVGYFICATHL 407 

** ***4r * * ★ ******** * * * * * 

CYP52A1A 433 LEEYYGKDANDFRPERWFEPSTKKLGWAYVPFNGGPRVCLGQQFALTEAS 482 

CYP52A2A 431 NPAVYGKDALEFRPERWFEPETKKLGWAFLPFNGGPRICLGQQFALTEAS 480 

CYP52A2B 431 NPAVYGKDALEFRPERWFEPETKKLGWAFLPFNGGPRICLGQQFALTEAS 480 

CYP52A3A 44 9 DPSIYGADADVFRPERWFEPETRKLGWAYVPFNGGPRICLGQQFALTEAS 4 98 

CYP52A3B 449 DPSIYGADADVFRPERWFEPETRKLGWAYVPFNGGPRICLGQQFALTEAS 498 

CYP52A5A 426 DPVYYGPDAAEFRPERWFEPSTKKLGWAYLPFNGGPRICLGQQFALTEAG 475 

CYP52A5B 426 DPVYYGPDAAEFRPERWFEPSTRKLGWAYLPFNGGPRICLGQQFALTEAG 475 

CYP52A8A 421 NPAYYGADAADFRPERWFEPSTRNLGWAFLPFNGGPRICLGQQFALTEAG 470 

CYP52A8B 421 NPAYYGADAADFRPERWFEPSTRNLGWAYLPFNGGPRICLGQQFALTEAG 470 

CYP52D4A 408 NEKVYGNDSHVFRPERWAALEGKSLGWSYLPFNGGPRSCLGQQFAILEAS 457 

** *^ ****** ^ irir* ^ ^ ^******* *******^ ** 

CYP52A1A 483 YVITRLAQMFETVSSDPGLEYPPPKCIHLTMSHNDGVFVKM 523 

CYP52A2A 481 YVTVRLLQEFAHLSMDPDTEYPPKKMSHLTMSLFDGANIEMY 522 

CYP52A2B 481 YVTVRLLQEFGHLSMDPNTEYPPRKMSHLTMSLFDGANIEMY 522 

CYP52A3A 499 YVTVRLLQEFAHLSMDPDTEYPPKLQNTLTLSLFDGADVRMY 540 

CYP52A3B 499 YVTVRLLQEFGNLSLDPNAEYPPKLQNTLTLSLFDGADVRMF 540 

CYP52A5A 476 YVLVRLVQEFSHVRLDPDEVYPPKRLTNLTMCLQDGAIVKFD 517 

CYP52A5B 476 YVLVRLVQEFSHIRLDPDEVYPPKRLTNLTMCLQDGAIVKFD 517 

CYP52A8A 471 YVLVRLVQEFPNLSQDPETKYPPPRLAHLTMCLFDGAHVKMS 512 

CYP52A8B 471 YVLVRLVQEFPSLSQDPETEYPPPRLAHLTMCLFDGAYVKMQ 512 

CYP52D4A 458 YVLARLTQCYTTIQLR-TTEYPPKKLVHLTMSLLNGVYIRTRT 499 
** ** * ^ ^ *** **.. * 

Figure 16C 
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Sequence Range: 1 to 1712 

le 2Q 3e 48 SO ee 79 sz ge lee 

GGTACCGAfiC TCAC6AGTTT TCfiGATTTTC GAGTTTG6AT TGTTTCCTTT GTTGATT6AA TTGACGAAAC CAGA66TTTT CAA6ACAGAT AAGATTGGGT 

lie IZe 130 148 158 168 178 180 198 289 

TTATCAAAAC GCAGTTTGAA ATATTCCAGT T6GTTTCCAA GATATCTTGA AGAA6ATTGA CGATTTGAAA TTTGAAGAAG TGGAGAAGAT CTG6TTT6GA 

218 228 238 240 258 268 278 2a 0 298 388 

TTGTT6GAGA ATTTCAAGAA TCTCAAGATT TACTCTAACG AC6GGTACAA CGAGAATTG7 ATTGAATTGA TCAAGAACAT GATCTTGGTG TTACAGAACA 
328 330 346 350 360 370 380 390 400 

TCAA6TTCTT G6ACCACACT 6A6AAT6CCA CAGATATACA A66CGTCAT6 TCATAAAAT6 GATGAGATTT ATCCCACAAT TGAA6AAA6A GTTTATG6AA 

410 428 438 448 458 460 478 480 490 500 

AGTGGTCAAC CAGAAGCTAA ACAGGAA6AA GCAAACGAAG AGGT6AAACA A6AAGAAGAA G6TAAATAA6 TATTTT6TAT TATATAACAA ACAAA6TAAG 

5ie 520 530 540 550 560 570 580 590 600 

GAATACAGAT TTATACAATA AATTGCCATA CTAGTCACGT GAGATATCTC ATCCATTCCC CAACTCCCAA GAAAAAAAAA AAGT6AAAAA AAAAATCAAA 

610 620 630 640 650 660 670 680 690 700 

CCCAAA6ATC AACCTCCCCA TCATCATCGT CATCAAACCC CCAGCTCAAT TC6CAAT6GT TAGCACAAAA ACATACACAG AAAGGGCATC AGCACACCCC 

MV STK TYT ERAS AHP> 

7ie 720 730 740 750 760 770 780 790 808 

TCCAAGGTT6 CCCAACGTTT ATTCCGCTTA ATGGAGTCCA AAAAGACCAA CCTCT6C6CC TCGATCGACG TGACCACAAC CGCCGAGTTC CTTTCGCTCA 
SKV AQRL FRL ME5 RRTM LCA SID VTTT AEF LSL> 

810 820 830 840 850 860 S70 880 890 900 

TC6ACAAGCT CGGTCCCCAC ATCT6TCTCG TGAAGACGCA CATCGATATC ATCTCACACT TCA6CTACGA GGGCACGATT GAGCCGTT6C TTGTGCTTGC 
IDICL GPH ICL VKTH lOI ISO F5YE GTI EPL LVLA> 

910 920 930 940 950 960 970 980 990 1000 

AGAGCGCCAC GGGTTCTTGA TATTCGA6GA CAGGAAGTTT GCTGATATCG GAAACACCGT 6AT6TTGCAG TACACCTCGG 6C6TATACC6 GATC6CC6C6 
E«H 6FL IFED RKF ADI 6HTV MLQ YTS 6VYR IAA> 

1010 1020 1030 1040 1050 1060 1070 1080 1090 1100 

•*•••••••♦ 

TGGAGTGACA TCACGAACGC GCACGGA6TG ACTGGGAAGG GCGTC6TTGA A6G6TTGAAA CGCGGTGCGG AGGGGGTAGA AAAGGAAAGG GGCGTGTTGA 
iSD ITNA H6V TGK GVVE GLR RGA E6VE KER GVL> 

1110 1120 1130 1140 1150 1160 1170 1180 1190 1200 

TGTTGGCGGA 6TT6TCGA6T AAAGCCTC6T TGGCGCAT6G TGAATATACC CGT6A6ACGA TCGA6ATTGC CAAGACTGAT C6GGACTTCG TGATTG66TT 
MLAE LSS KGS LAH6 EYT RET lElA KSD REF VI6F> 

1210 1220 1230 1240 1250 1260 1270 1280 1290 1300 

CATC6CGCAG CGGGACATGG GGGGTA6AGA AGAAGGGTTT GATTGGATCA TCATGACGCC TGCTGTGGGG TTGGATGATA AAGGCGATGC GTTG6GCCAG 
lAQ RON 6GRE E6F DBI IHTP GV6 LOO ICGDA LCQ> 

1310 1320 1330 1340 1350 1360 1370 1380 1390 1400 

CAGTATAGGA CTGTTGATGA GGT6GTTCTC ACTG6TACCG AT6T6ATTAT TGTCGGGA6A GGGTT6TTTG GAAAAGGAAG AGACCCTGA6 GTGGAG6GAA 
QYR TVOE VVL TCT DVII V6R GLF GKGR OPE V£6> 

1410 1420 1430 1440 1450 1460 1470 1480 1490 1500 

AGAGATACAG GGATGCTGGA TGGAA6GCAT ACTTGAA6A6 AACTGGTCAG TTA6AATAAA TATT6TAATA AATAGGTCTA TATACATACA CTAA6CTTCT 
KRYR DAG WKA YIKR T6Q LE»> 

1510 1520 1530 1540 1550 1560 1570 1580 1590 1600 

•••••••••• 

A6GACGTCAT TGTAGiaTC 6AAGTTGTCT 6CTAGTTTA6 TTCTCATGAT TTC6AAAACC AATAACGCAA T66ATGTAGC A6GGATGGT6 GTTAGTGCGT 

1610 1620 1630 1640 1650 1660 1670 1680 1690 1700 

•••••••••• 

TCC76ACAAA CCCA6AGTAC 6CCGCCTCAA ACCACGTCAC ATTCGCCCTT TGCHCATCC GCATCACTTG CTTGAA66TA TCCAC6TACC A6TTGTAATA 

1710 
CACCTTGAAG AA 
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1 gram (g) Whole fennentation broth (70°C) 

+ 

Ig Internal standard C15:0 lOg/1 in IN KOH (70°C) 

+ 

0.8ml6NHCl 

+ 

6 ml Methyl-t-Biityl-Ether (MtBE) 

4' Extract in 60 ml separatory funnel 

1 ml MtBE phase pipeted in 12X75mm test tube 

Dry down to solids under N2 stream 

i 

Add 1 ml 12% BF3-Methanol (Kodak, 4*'C) and stopper test tube 
Dissolve solids,esterify for 15 min.@ 60^C, quiescently 

Add 0.25 ml saturated NaCl solution (71. 5g NaCl/200 ml H20) 

4' Vortex to mix 

Add 1 ml Mixed Ethers (50% diethyl ether 50% petroleum ether,v/v) 

Shake for 1 min. To extract methylesters 
i 

Inject 5 ul of mixed ether phase into GC 



GC Parameters 

Colunm: HP-INNOWAX capillary column, 30m X 0.32 mm, 0.5um fihn 

thickness 

Split ratio: 1:100 

Column Head Pressure : 13.5 psig 
Injector temperature: 240'*C 
FID Detector Temp. : 250°C 

Temp. Prog.: 90°C for 0 mm. to 190'»C @ 7*>C/min. for 0 min. to 235°C @ 
12*'C/mm.for30min. 

Figure 35 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Wilson, C. Ron 
Craft, David L. 
Birich, Dudley 
Bshoo, Mark 
Madduri, Krishna N. 
Comet t, Cathy A. 
Brenner, Alfred A. 
Tang, Maria 
Leper, John C. 
Oleeson, Martin 

(ii) TITLE OF INVENTION: CYTOCHRC^ P450 MONOOXY6ENASE AND NADPH 
CYTOCHROME P450 OXIDOREDUCTASE GENES AND PROTEINS RELATED 
TO THE OMEGA HYDROXYLASE COMPLEX OF CANDXDA TROPICALIS AND 
METHODS RELATING THERETO 

(iii) NUMBER OF SEQUENCES: 107 

<iv) CORIUSSPONDENCE ADDRESS: 

(A) ADDRESSEE: HENKEL CORPORATION 

(B) STREET: 2500 Renaissance Boulevard, Suite 200 

(C) CITY: Gulph Mills 

(D) STATE: PA 

(E) COUNTRY: U.S.A. 

(F) ZIP: 19406 

(V) COMPUTE K READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC coitpatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(Viii) ATTORNEY/AGENT INFORMATION: 
(A) NAME: Drach, John E. 

(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQX7ENCB DESCRIPTION: SEQ ID N0:1: 
CCTTAATTAA ATGCACGAAG CGGAGATAAA AG 32 



- 1 - 
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(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQDBNCB CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEEXNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLBCDLE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 
CCTTAATTAA GCATAAGCTT GCTCGAGTCT 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENOTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 
CCTTAATTAA AC6CAATGGG AACATGGAGT G 



(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CCTTAATTAA TCGCACTACG GTTATTGGTA TCAG 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQX7SNCE DESCRIPTION: SEQ ID NO: 5: 
CCTTAATTAA TCAAAGTACG TTCAGGCGG 



(2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CCTTAATTAA (SGCAGACAAC AACTTGGCAA AGTC 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 
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(C) STRAKDEDMESS : single 

(D) TOPOLOGY: linear 

(ii) MOLBCDXiE TYPE: other nucleic acid 
(xi) SEQUENCE OESCRXPTION: SEQ ID NO: 7: 
CCTTAATT2JI GAGGTCGTT6 GTTGAGTTTT C 31 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
CCTTAATTAA TTGATAATGA CGTTGCG6G 29 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDSravESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9: 
AGGCGC6CCG GAGTCCAAAA AGACCAACCT CTO 33 

(2) INFORMATION FOR SEQ ID N0:10: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CCTTAATTAA TACGTGQATA CCTTCAAGCA AGTG 34 

(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
.CCTTAATTAA GCTCACGAGT TTTGGGATTT TOGAG 35 

(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENOTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GGGTTTAAAC CGCAGAGGTT GGTCTTTTTG GACTC 35 
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(2) INFORMATION FOR SEQ ID NO: 13: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GGGTTTAAAC XO 

(2) INFORMATION FOR SEQ ID NO: 14: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AGGCGC6CC 9 

(2) INFORMATION FOR SEQ ID NO: 15: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: IS: 
CCTTAATTAA 10 



(2) INFORMATION FOR SEQ ID NO: 16: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LEN(3TH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(ix) FEATURE: 

(A) NAME/KEY: inisc_£eature 

(B) LOCATION: 3.. 4" 

(D) OTHER INFORMATION: /notes "yedCTP or dTTP" 
(ix) FEATURE: t 

(A) NAME/KEY: iiiisc_£eature 

(B) LOCATION: 9.. 10 

(D) OTHER INFORMATION: /notea *>w-dATP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: mis cofeature 

(B) LOCATION: 15.. 16 

(D) OTHER INFORMATION: /noteo "wadATP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 18.. 19 

(D) OTHER INFORMATION: /notes «w»dATP or dTTP** 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
TCYCAAACHG CTTACHGCrWGA A 21 
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(2) INFORMATION FOR SEQ ID N0:17: 
(i) SBQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TVPB: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLBCDLE TYPE: other nucleic acid 
(ix) FEATURE : 

(A) NAME/KEY: misc^feature 

(B) LOCATION: 12 . . 13 

(D) OTHER INFORMATION: /note» "y-dCTP or dTTP" 
(ix) FEATORE: 

(A) NAME/KEY: iiiisc_f eature 

(B) LOCATION: 15.. 16 

(D) OTHER INFORMATION: /note- "WadATP or dTTP" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
G6TTTGG6TA AYTCKACTTA T 21 

(2) INFORMATION FOR SEQ ID NO: 18: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CGTTATTATC ATTTCTTC 18 

(2) INFORMATION FOR SEQ ID NO: 19: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(ix) FEATURE: 

(A) NAME/KEY: mis cofeature 

(B) LOCATION: 3. .4 

(D) OTHER INFORMATION: /note» "msdATP or dCTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc_£eat\u:e 

(B) LOCATION: 9. .10 

(D) OTHER INFORMATION: /noteo "r^dATP or dCHTP** 
(xi) SBQUENCE DESCRIPTION: SEQ ID NO: 19: 
GCMACACCRCHA CCTG6ACC 20 

(2) INFORMATION FOR SEQ ID NO: 20: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
ATCCCAATCG TAATCA6C 18 
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(2) INFORMATION FOR SBQ ID N0:21: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEZ3NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
ACTTGTCTTC GTTTA6CA 18 

(2) INFORMATION FOR SEQ ID NO:22: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CTACGTrCTGT GGTGAT6C 18 

(2) INFORMATION FOR SEQ ID NO:23: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(ix) FEATURE: 

(A) NAME/KEY: misc^feature 

(B) LOCATION: 3 ♦.4 

(D) OTHER INFORMATION: /note* "n-dATP or dCTP Or dGTP Or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc_£eature 

(B) LOCATION: 6.. 7 

(D) OTHER INFORMATION: /note» "Y=dCTP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc_£eature 

(B) LOCATION: 9.. 10 

(D) OTHER INFORMATION: /note° "n»dATP or dCTP or dGTP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc_£eature 

(B) LOCATION: 12.. 13 

(D) OTHER INFORMATION: /note« "n^dATP or dCTP or dGTP or 

dTTP" 

(ix) FEATURE: 

(A) NAME/KEY: misc^f eatxire 

(B) LOCATION: 15.. 16 

(D) OTHER INFORMATION: /noteo "n-dATP or dCTP or dGTP or 

dTTP" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CGNGAYACNAC NGCNGG 17 

(2) INFORMATION FOR SEQ ID NO: 24: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOIXXSY: linear 

(ii) MOLBCOLE TYPE: other nucleic acid 
(ix) FEATDSB: 

(A) HAMB/KEY: mis cofeature 

(B) LOCATION: 3«.4 

(D) OTHER INFORMATION: /noteo "r^dATP or dGTP" 
(ix) FBATDRB: 

(A) NAMB/KEY: misc_f eatiire 

(B) LOCATION: 6. .7 

(D) OTHER INFORMATION: /note= "y=dCTP or dTTP" 
<ix) FEATURE: 

(A) NAMB/KEY: misc^featiire 

(B) LOCATION: 9.. 10 

(D) OTHER INFORMATION: /note^ "n^dATP or dCTP or dGTP or 

dTTP" 

(ix) FBATDRB: 

(A) NAMB/KEY: misc^feature 

(B) LOCATION: 12. .13 

(D) OTHER INFORMATION: /notes »n«dATP or dCTP or dGTP or 

dTTP" 

(ix) FEATURE: 

(A) NAME/KEY: misc^feature 

(B) LOCATION: 15.. 16 

(D) OTHER INFORMATION: /note» "n-dATP or dCTP or dGTP or 

dTTP" 

(xi) SEQUENCE DESCRIPTICOY: SEQ ID NO: 24: 
AGRGAYACNA CN6CNGG 17 

(2) INFORMATION FOR SEQ ID NO:25: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(ix) FEATURE: 

(A) NAMB/KEY: inisc_£eature 

(B) LOCATION: 3.. 4 

(D) OTHER INFORMATION: /note» "n=dATP or dCTP or dGTP or 

dTTP" 

(ix) FEATURE: 

(A) NAME/KEY: misc_f eattire 

(B) L0C31TI0N: 6.. 7 

(D) OTHER INFORMATION: /note- "r-dATP or d(3TP** 
(ix) FEATURE: 

(A) NAME/KEY: misc_£eature 

(B) LOCATION: 9.. 10 

(D> OTHER INFORMATION: /notes "yodCTP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc^feature 

(B) LOCATION: 12.. 13 

(D) OTHER INFORMATION: /notea "yadCTP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: mis cofeature 

(B) LOCATION: IS. .16 
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(D) OTHER INF0RM;itI0N: /note» "n=dATP or dCTP or dGTP or dTTP" 
(xi) SBQUENCB DESCRIPTION: SEQ 10 NO: 25: 
ASNQCRAAYT GYTGNCC 17 

2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECOLE TYPE: other nucleic acid 
(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: I..2" 

(D) OTHER INFORMATION: /note^ "yodCTP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: mi8C_feature 
{B) LOCATION: 4.. 5 

(D) OTHER INFORMATION: /note= "n^dATP or dCTP or dGTP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: mis cofeature 

(B) LOCATION: 7.. 8 

(D) OTHER INFORMATION: /note= "r=dATP or dGTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: lO.^ll 

(D) OTHER INFORMATION: /note^ "y=dCTP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: iiiisc_£eature 

(B) LOCATIONS 13 . . 14 

(D) OTHER INFORMATION: /note= "y-dCTP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: inisc_£eature 

(B) LOCATION: 16 ..17 

(D) OTHER INFORMATION: /notes "UsdATP or dCTP or dGTP or 

dTTP" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
YAANGCRAAY TGYTGNCC 18 



2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
ATTCAACGGT GGTCCAAGAA TCTGTTTGG 29 



(2) INFORMATION FOR SEQ ID N0:28: 
(i) SEQTIENCB CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2B: 
GAGCTAT6TT GAGACCACA6 TTT6C 25 

(2) INF0KM21TI0N FOR SEQ ID K0:29: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQX7BNCE DESCRIPTION: SEQ ID NO: 29: 
CTTCAGTTAA AGCAAATTGT TTGGCC 26 

(2) INFORMATION FOR SEQ ID N0:30: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CTCGGGAA6C 6CGCCATTGT GTTGG 25 

2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOIjBCULB TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TAATACGACT CACTATAGGG CGAATTG(3C 29 

(2) INFORMATION FOR SEQ ID N0:32: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 3.. 4^ 

(D) OTHER INFORMATION: /note» "rodATP or dGTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc_£eature 

(B) LOCATION: 4.. 5 

(D) OTHER INFORMATION; /note= "y^dCTP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 16.. 17 

(D) OTHER INFORMATION: /note= "yodCTP or dTTP" 
(xi) SEQX7BNCB DESCRIPTION: SEQ ID NO: 32: 
TGRYTCAAAC CATCTYTCTG G 21 
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(2) INFORMATION FOR SEQ ID NO: 33: 
(i) SEQX7ENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
GGACCGGC6T TAAAGG6 17 



(2) INFORMATION FOR SEQ ID NO:34: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRAKDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(ix) FEATURE: 

(A) NAME/KEY: iiiisc_£eature 

(B) LOCATION: 9.. 10 

(D) OTHER INFORMATION: /note= "w=rdATP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc^feature 

(B) LOCATION: 12.. 13 

(D) OTHER INFORMATION: /note= "y-dCTP or dTTP" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
CATAOTC6WA TYAT6CTTAG ACC 23 

2) INFORMATION FOR SEQ ID NO:35: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
GGACCACCAT TGAATG6 17 

(2) INFORMATION FOR SEQ ID NO: 36: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 540 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

ATGATTGAAC AACTCCTTAGA ATATTGGTAT GTCGTTGTGC CAGTGTTGTA CATCATCAAA 
CAACTCCTTG CATACACAAA GACTCGCGTC TTGATGAAAA AGTTGGGTGC TGCTCCAGTC 
ACAAACAA6T TGTACGACAA CGCTTTCGGT ATCGTCAATG GATGGAAGGC TCTCCAGTTC 
AAGAAAGAGG GCAGGGCTCA AGAGTACAAC GATTACAAGT TT(3ACCACTC CAAGAACCCA 
AGCGTGGGCA CCTACGTCAG TATTCTTTTTC <3GCACCA(3GA TCGTCGTGAC CAAAGATCCA 
GAGAATATCA AAGCTATTTT GGCAACCCAG TTTGGTGATT TTTCTTTGGG CAAGAGGCAC 
ACTCTTTTTA AGCCTTTGTT AGGTGATGGG ATCTTCACAT TGGACGGCGA AGGCTGC5AAG 
CACAGCAGAG CCATGTTGAG ACCACAGTTT GCCAGAGAAC AAGTTGCTCA TGTGACGTCG 
TTGGAACCAC ACTTCCAGTT GTTGAAGAAG CATATTCTTA AGCACAAGGG TGAATACTTT 
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2) INFORMATION FOR SEQ ID NO:37: 

(i) SEQOENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDHZ3NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
CCGATGAAGT TTTCGACGAfi TACCC 

2) INFORMATION FOR SEQ ID NO:38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2S base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECDIiS TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 
AAGGCTTTAA CGTGTCCAAT CT6GTC 

(2) INFORMATION FOR SEQ ID NO: 39: 
(i) SEQUENCE CHARACTERISTICS: 

(A) XJBN6TH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
ATTATCGCCA CATACTTCAC CAAAT6G 

(2) INFORMATION FOR SEQ ID NO: 40: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 
CGAGATCGTG GATACGCTGG A6TG 

(2) INFORMATION FOR SEQ ID NO: 41: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCSaPTION: SEQ ID NO: 41 
GCCACTCGGT AACTTTGTCA GGGAC 

(2) INFORMATION FOR SEQ ID NO: 42: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOIiBCDZiB TYPE: other nucleic acid 
(xi) SBQUEKCB DESCRIPTION: SEQ ID N0:42 
CATT6AACTG A6TA6CCAAA ACA6CC 

(2) INFORMATION FOR SEQ ID NO:43: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LBNGPTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNSSS : single 

(D) TOPOLOGY: linear 

(ii) MOLBCDLE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 
CCTACGTTTG GTATCGCTAC TCCGTTG 

(2) INFORMATION FOR SEQ ID NO:44: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
TTTCCAGCCA GCACCGTCCA AG 

(2) INFORMATION FOR SEQ ID NO:45: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45 
6CAGAGCCGA TCTATGTTGC GTCC 

(2) INFORMATION FOR SEQ ID NO: 46: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46 
TCATTGAATG CTTCCAGGAA CCTCG 

2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDSraVBSS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47 
AA6AGG6CA6 GGCTCAAGAG 

(2) INFORMATION FOR SEQ ID NO: 48: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDMESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48 
TCCATGTGAA GATCCCATCA C 

(2) INFORMATION FOR SEQ ID NO:45: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 haae pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SBQ ID N0:49 
CTTGAAGGCC GTGTTGAACG 

(2) INFORMATION FOR SEQ ID NO: 50: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50 
CA6(3ATTTGT CTGAGTTGCC G 

(2) INFORMATION FOR SEQ ID NO: 51: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE.: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
CCATTGCCTT GAGATACGCC ATTGGTAG 

(2) INFORMATION FOR SEQ ID NO: 52: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2S base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
AGCCTTGGTG T CG T rClTlT CAACGG 

(2) INFORMATION FOR SEQ ID NO:53: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
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(xi) SEQX7EKCE DESCRIPTION: SEQ ID NO: 53 
TTGGGTTTGT TTGTTTCCTG TGTC06 

(2) INFORMATION FOR SEQ ID NO: 54: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
CCTTTGACCT TCAATCTGGC GTAGACG 

(2) INFORMATION FOR SEQ ID NO:55: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55 
6TTTGCTGAA TAOGCTGAAG GTGATG 

(2) INFORMATION FOR SEQ ID NO: 56: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56 
TGGAGCTGAA CAACTCTCTC GTCTCGO 

(2) INFORMATION FOR SEQ ID NO:57: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LEN(3TH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; other nucleic acid 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 57 
TTCCTCAACA OXSACAGCGG 

2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICrS : 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58 
AGTCAACCAG GTGTGGAACT CGTC 

(2) INFORMATION FOR SEQ ID NO: 59: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 49 base pairs 
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(B) TYPE: nucleic acid 

(C) STRAMDEDNBSS : single 

(D) TOPOLOGY: linear 

(ii) MOLBCDLB TYPB: other nucleic acid 
(xi) SBQUKNCB DBSC31IPTI0N: SBQ ID NO: 59: 
GGATCCTAAT ACGACTCACT ATA6GGAGGA AGAGGGCAGG GCTCAAGAG 49 

(2) INFORMATION FOR SBQ ZD N0:60: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPB: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) mLBCDLE TYPB: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
TCCATGTGAA GATCCCATCA CGAGTGTGCC TCTTGCCCAA AG 42 

(2) INFORMATION FOR SEQ ID NO: 61: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
GGATCCTAAT ACGACTCACT ATAGGGA6GC CGATGAAGTT TTC<3ACGAGT ACCC 54 

(2) INFORMATION FOR SEQ ID NO: 62: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
AAGGCTTTAA CGTGTCCAAT CT66TCAACA TAGCTCTGGA OTGCTTCCAA CC 52 

(2) INFORMATION FOR SEQ ID NO:63: 
(i) SEQUENCE CHARACTERISTICS': 

(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPB: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
GGATCCTAAT ACGACTCACT ATAGGGAGGA TTATCGCCAC ATACTTCACC AAATGG 56 

(2) INFORMATION FOR SEQ ID NO: 64: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
C(3AaATC6TG GATAC6CTGG AOTGCGTCGC TClTeriflT CAACAATTCA AG 52 
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(2) IHFORMATION FOR SEQ ID N0:65: 
(i) SEQUENCE CSARACTERISTZCS : 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEmVESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:65: 
CATTGAACTG A6TAGCCAAA ACAGCCCATG GTTTCAATCA ATGGGAGGC 49 

(2) INFORMATION FOR SEQ ID NO:66: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 
G6ATCCTAAT ACGACTCACT ATAGGGAGGG CCACTCGGTA ACTTTGTCAG GGAC 54 

(2) INFORMATION FOR SEQ ID NO: 67: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
ID) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
GGATCCTAAT ACGACTCACT ATAGGGAGGC CTACGTTTGG TATCGCTACT CCGTTG 56 

2) INFORMATION FOR SEQ ID NO:68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
TTTCCAGCCA GCACC6TCCA AGCAACAAGG AGTACAAGAA ATC6TGTC 48 

(2) INFORMATION FOR SEQ ID NO:69: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
GGATCCTAAT ACGACTCACT ATAG(3GAGGG CAGAGCC6AT CTATQTTGCG TCC 53 

(2) INFORMATION FOR SEQ ID NO:70: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECDLE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
TCA.TTGAATG CTTCCAGGAA CCTCGCCACA TCCATCGAGA AC0G6 45 

(2) INFORMATION FOR SEQ ID NO: 71: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRAMDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
GGATCCTAAT ACGACTCACT ATAGGGAGGC TTGAAGGCCG TGTTGAAC6 49 

(2) INFORMATION FOR SEQ ID NO: 72: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNE5S: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
CAGGATTTGT CTGAGTTGCC GCCTGATCAA GATAGGATCC TTGCCG 46 

(2) INFORMATION FOR SEQ ID NO: 73: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
GGATCCTAAT ACGACTCACT ATAGGGAGGG GTTTGCTGAA TACGCTGAAG GTGAT6 56 

(2) INFORMATION FOR SEQ ID NO: 74: 
(i) SEQUENCE CmRACTERISTICS : 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGHT: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
TGGAGCTGAA CAACTCTCTC GTCTCGGGTG GTCGAATGC5A CCCTTCGTCA AG 52 

(2) INFORMATION FOR SEQ ID NO: 75: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
GGATCCTAAT ACGACTCACT ATAGGGAGGT TCCTCAACAC (^GACAGCGG 49 

(2) INFORMATION FOR SEQ ID NO: 76: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLBCOLE TYPE: otiher nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 
ASTCAACCA6 6TGT6GAACT CGTCGGTGGC AACAATGAAA AACACCAAG 

(2) INFORMATION FOR SEQ ID NO: 77: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
GGATCCTAAT ACGACTCACT ATA6G6A6GC CATT6CCTTG AGATACGCCA TT6GTA6 

2) INFORMATION FOR SEQ ID NO:78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 
AGCCTTGGTG TCGTTCTTTT CAACGGAAGG TGGTCTCGAT GGTGTGTTCA ACC 

(2) INFORMATION FOR SEQ ID NO: 79: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOIiECOLB TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 
GGATCCTAAT ACGACTCACT ATAGGGAGGT TGGGTTTGTT TGTTTCCTGT GTCCG 

(2) INFORMATION FOR SEQ ID NO: 80: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 
CCTTTGACCT TCAATCTGGC GTAGACGCAG CACCACCGAT CCACCACTIG 

(2) INFORMATION FOR SEQ ID NO: 81: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4206 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQOBNCE DESCRIPTION: SEQ ID NO: 81: 

CarCAAGATC ATCTATGGGG ATAATTACGA CAGCAAGATT GCAGAAAGAG CGTTGGTCAC 60 

AATCGAAAGA GCCTATGGCG TTGCCGTCGT TGAGGCAAAT GACAGCACCA ACAATAACGA 120 

TGGTC<XAGT GAASAGCCTT CAGAACAGTC CATTGTTGAC GCTTAAtSGCA CGGATA ATTA 180 

C6TGGGGCAA AGGAACGCXK5 AATTAGTTAT GGGGGGATCA AAAGCGGAAG ATTTGTGTTG 240 

CrtGTGGGTT Tm ' CCrrm TTTTTCATAT GATTTCTTTG CGCAAGTAAC ATGTG CCAAT 300 

TTAGTTTCTG ATTAGCGTGC CCCACAATTG GCATCGTGGA CGGGCGTGTT TTGTCATACC 360 

CCAASTCTTA ACTAGCTCCA CAGTCTCGAC GGTGTCTCGA CGATGTCTTC TTCCSUXICCT 420 

CCC3V.TGAATC ATTCAAAGTT GTTGGGGGAT CTCCACCAAG GGCACCGGAG TT AATGCTTA 480 

TGTTTCTCCC ACTTTGGTTG TGATTGGGGT AGTCTAGTGA GTTGGAGATT TTCTTTTTTT 540 

CGCAGGTGTC TCCGATATCG AAATTTGATG AATATAGAGA GAAGCCAGAT CAGCAC AGTA 600 

GATTGCCTTT GTAGTTAGAG ATGTTGAACA GCAACTAGTT GAATTACAOG CCACCACTTG 660 

ACAGCAAGTG CAGTGAGCTG TAAACGATGC AGCCAGAGTG TCACXUVCCAA CT6ACGTTGG 720 

GTGGAGTTGT TGTTGTTGTT GTTGGCAGGG CCATATTGCT AAACGAAGAC AA6TAGC71CA 780 

AAACCCAAGC TTAAGAACAA AAATAAAAAA AATTCATACG ACAATTCCAA AGCCATTOAT 840 

TTACATAATC AACAGTAAGA CAGAAAAAAC TTTCAACATr TCAAAGTTCC Cl'lTTT CCTA 900 

TTACTTCTTT rrrrrCXTC T TTCCTTCTTT CCTTCTGrTT TTCTTACTTT ATCAGTCTTT 960 

TACTTQTTTT TGCAATTCCT CATCCTCCTC CTACTCCTCC TCACCATGGC TTTAGACAAG 1020 

TTAGATTTGT ATGTCATCAT AACATTGGTG GTCGCTGTAG CCGCCTATTT TGCTAAGAAC 1080 

CAGTTCCTTG ATCAGCCCCA GGACACCGGG TTCCTCAACA CGGACAGCGG AAGCAACTCC 1140 

AGAGACGTCT TGCTGACATT GAAGAAGAAT AATAAAAACA CGTTGTTGTT GTTTGGGTCC 1200 

CAGACGGGTA CGGCAGAA6A TTACGCCAAC AAATTGTCCA GAGAATTGCA CTCCAGATTT 1260 

GGCTTGAAAA CGATGGTTGC AGATTTCGCT GATTACGATT GGGATAACTT CGGAGATATC 1320 

ACCGAAGACA TCTTGGTGTT TTTCATTGTT GCCACCTATG GTGAGGOTGA ACCTACCGAT 1380 

AATGCCGACG AGTTCCACAC CTGGTTGACT GAAGAAGCTG ACACTTTGAG TACCTTGAAA 1440 

TACACCGTGT TCGG6TTGGG TAACTCCACG TACGA6TTCT TCAATGCCAT TGGTAGAAAO 1500 

TTTGACAGAT TGTTGAGOQA GAAAGGTGGT GACAGGTTT6 CTGAATACGC TGAAGGTGAT 1560 

GACGGTACTG GCACCTrGGA CGAAGATTTC ATGGCCTGGA AGGACAATGT CTTTGACGCC 1620 

TTGAAGAATO ATTTGAACTT TOAAGAAAAG GAATTGAAGT ACOAACCAAA CGTGAAATTG 1680 

ACTGAGAGAG ACGACTTGTC TGCTGCTGAC TCCCAAGTTT CCTTGGGTGA GCCAAACAAG 1740 

AAGTACATCA ACTCCGAGGO CATCGACTTG ACCAAGGGTC CATTCGACCA CACCCACCCA 1800 

TACTTGGCCA GAATCACCGA GACGAGAGAG TTGTTCAGCT CCAAGGACAG ACACTGTATC 1860 

CAC6TTGAAT TTGACATTTC TGAATCGAAC TTGAAATACA CCACCGGTGA CCATCTAGCT 1920 

ATCTCGCCAT CCAACTCCQA CGAAAACATT AAOCAATTTG CCAAGTGTTT CGGATTGGAA 1980 

GATAAACTCG ACACTGTTAT TGAATTGAAG GCGTTGGACT CCACTTACAC CATCCCATTC 2040 

CCAACCCCAA TTACCTACGG TGCTGTCATT AGACACCATT TAGAAATCTC CGGTCCA6TC 2100 

TCQAGACAAT TCTTTTTGTC AATTGCTGGG TTTGCTCCTG ATQAAGAAAC AAAGAAGGCT 2160 

TTTACC3U5AC TTGGTGGTGA CAAGCAAGAA TTCGCCGCCA AGGTCACCCG CAGAAAGTTC 2220 

AACATTCCCG ATGCCTTGTT ATATTCXTTCC AACAACGCTC CATGGTCCGA TGTTCCTTTT 2280 

GAATTCCTTA TTGAAAACGT TCCACACTTG ACTCCACGTT ACTACTCCAT TTCGTCTTCG 2340 

TCATTGAGTG AAAAGCAACT CATCAACGTT ACTGCA6TTG TTGAAGCCGA AGAAGAAGCT 2400 

GATGGCAGAC CAGTCACTGG TCTTGTCACC AACTTGTTGA AGAACGTTGA AATTGTGCAA 2460 

AACAAGACTG GCGAAAAGCC ACTTGTCCAC TACGATTTGA GCXMCCCAAG AGGCAAGTTC 2520 

AACAAGTTCA AGTTGCCAGT GCATGTGAGA AGATCCAACT TTAAGTTGCC AAAGAACTCC 2580 

ACCACCCCAG TTATCTTGAT TGGTCCAGGT ACTGGTGTTG CCCCATTGAO AGGTTTTGTC 2640 

AGAGAAAGAG TTCAACAAGT CAAGAATGGT GTCAATGTTG GCAAGACTTT GTTGTTTTAT 2700 

GGTTGCAGAA ACTCCAACGA GGACTTTTTG TACAAGCAAG AATGGGCCGA GTACGCTTCT 2760 

GTTTTGGGTG AAAACTTTGA GATGTTCAAT GCCTTCTCCA GACAAGACCC ATCCAAGAAG 2820 

6TTTACGTCC AGGATAAGAT TTTAGAAJUIC AGCXAACTTG TGCAOGAGTT GTTOACTGAA 2880 

GGTGCCATTA TCTACGTCTG TGGTGATGCC AGTAGAATGG CTAGAGACGT GCAGACCACA 2940 

ATTTCCAAGA TTGTTGCTAA AAGCAGAGAA ATTAGTGAAG ACAAGGCTGC TGAATTGQTC 3000 

AAGTCCTGGA AGGTCCAAAA TAGATACCAA GAAGATGTTT GGTAGACTCA AACGAATCTC 3060 

TCTTTCTCCC AACXSCATTTA TGAATCTTTA TTCTCATTGA AGCTTTACAT ATGTTCTACA 3120 

criTArrriT rxTi " m " m ttattattat attaogaaac ataggtcaac tatatatact 3180 

TGATTAAATG TTATAGAAAC AATAACTATT ATCTACTCGT CTACTTCTTT GGCATTGACA 3240 

TCAACATTAC CGTTCCCATT ACCGTTGCCG TTQGC3UITQC CGGGATATTT AOTACAOTAT 3300 
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CTCCAATCCG GATTTGAGCT ATTGTAGATC AGCTGCAAGT CATTCTCCAC CTTCAACCAG 3360 

TACTTATACT TCATCTTTGA CTTCAAGTCC AAGTCATAAA TATTACAAGT TAGCAAGAAC 3420 

TTCTGGCCAT CCACGATATA GACGTTATTC ACGTTATTAT GCGACX5TATG GATGTGGTTA 3480 

TCCTTATTGA ACTTCTCAAA CTTCAAAAAC AACCCCACXTT CCC6CAAC6T CATTATCAAC 3540 

GACAASTTCT 6GCTCACGTC GTCGGASCTC GTCAASTTCT CaUlTTAGATC GTTCTTGTTA 3600 

TTGATCTTCT GGTACTTTCT CAATTGCTGG AACACATTGT CCTCGTT6TT CAAATAGATC 3660 

TTGAACAACT TTTTCAACGG GATCAACTTC TCAATCTGGG CCAAGATCTC CGCCGGGATC 3720 

TTCAGAAACA AGTCXrTGCAA CCCCTG6TCG ATGGTCTCCG GGTACAACAA GTCCAAGGGG 3780 

CAGJUUnXSTC TAGGCACGT6 TTTCAACTGG TTCAAOGAAC ATGTTCGACA GTAGTTCGA6 3840 

TTATAGTTAT CGTACAACCA TTTTGGTTTG ATTTCGAAAA TGACGGAGCT GATGCCATCA 3900 

TTCTCCTGGT TCCTCTCATA GTACAACTGG CACTTCTTCG AQAQGCTCAA TTCCTCGTAG 3960 

TTCCCGTCCA AGATATTCGG CAACAAGAGC CCGTACCGCT CACGGAGCAT CAAGTCX3TGG 4020 

CCCTGGTTGT TCAACTTGTT GATGAAGTCC GAGGTCAASA CAATCAACTG GATGTCGATG 4080 

ATCTGGTGCXS GGAACAA6TT CTTGCATTTT AGCTCGAT6A AGTC6TACAA CTCACACGTC 4140 

GAGATATACT CCTGTTCCTC CTTCAAGAGC CGGATCCGCA AGA6CTT6T6 CTTCAAGTA6 4200 

Tcxnrra 4206 



(2) INFORMATION FOR SEQ ID NO: 82: 
(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4145 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: UNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:82: 

TATATGATAT ATGATATATC TTCCTGTGTA ATTATTATTC GTATTCGTTA ATACTTACTA 60 

CATTTTTTTT TCTTTATTTA TGAAGAAAAG GAGAGTTCGT AAGTTGAGTT GAGTAGAATA 120 

GGCTGTTGTG CATACGGGGA GCAGAGGAGA 6TATCCGACG AGGAGGAACT GGGTGAAATT 180 

TCATCTATGC TGTTGCGTCC TGTACTGTAC TGTAAATCTT AGATTTCCTA GAGGTTGTTC 240 

TAGCAAATAA AGTGTTTCAA GATACAATTT TACAGGCAAG GGTAAA6GAT CAACTGATTA 300 

GCGGAAGATT GGT6TTGCCT GTG6GGTTCT TTTATTTTTC ATATGATTTC TTTGCGCGA6 360 

TAACATGTGC CAATCTAGTT TATGATTAGC GTACCTCCAC AATTGGCATC TTGGACGGGC 420 

GTGTTTTGTC TTACCCCAAG CCTTATTTAG TTCCACAGTC TCGACGGTGT CTCGCCGATG 480 

TCTTCTCCCA CCCCTCGCAG GAATCATTCG AAGTTGTTGG GGGATCTCCT CCGCAGTTTA 540 

TGTTCATGTC TTTCCCACTT TGGTTGTGAT TGGGGTAGCG TAGTGAGTTG GTGATTTTCT 600 

TTTTTCGCAG GTGT C TCCGA TATCGAAGTT T6ATGAATAT AGGAGCCAGA TCA6CATG6T 660 

ATATTGCCTT TGTAGATAGA GATGTTGAAC AACAACTAGC TGAATTACAC ACCACCGCTA 720 

AACGATGCGC ACAGGGTGTC ACCGCCAACT GACGTTGGGT GGAGTTGTTG TTGGCAGGGC 780 

CATATTGCTA AACQAAGAGA AGTAGCACAA AACCCAAGGT TAAGAACAAT TAAAAAAATT 840 

CATACGACAA TTCCACAGCC ATTTACATAA TCAACAGCGA CAAATGAGAC AGAAAAAACT 900 

TTCAACATTT CAAAQTTCCC TTTTTCCTAT TACTTCTTTT TTTCTTTCCT TCCTTTCATT 960 

TCCTTTCCTT CTGCTTTTAT TACTTTACCA 6TCTTTTQCT TGTTTTT6CA ATTCCTCATC 1020 

CTCCTCCTCA CCATGGCTTT AGACAAGTTA GATTTGTATG TCATCATAAC ATTGGTGGTC 1080 

GCTGTGGCCG CCTATTTTGC TAAGAACCAG TTCCTTGATC AGCCCCAGGA CACCGGGTTC 1140 

CTCAACACGG ACA6CGGAAG CAACTCCAGA GACGTCTTGC TGACATTGAA GAAGAATAAT 1200 

AAAAACACGT TGTTGTTGTT TGGGTCCCA6 ACCGGTACGG CAGAAGATTA CGCCAACAAA 1260 

TTGTCAAGAO AATTGCACTC CAGATTTGGC TTGAAAACCA TGGTT6CAGA TTTCGCTGAT 1320 

TACGATTGGG ATAACTTCGG AGATATCACC GATUSAIATCT TGGTGTTTTT CATCGTTGCC 1380 

ACCTACGGTG AGGGTGAACC TACCGACAAT GCCGACGAGT TCCACACCTG GTTGACTGAA 1440 

GAAGCTGACA CTTTGAGTAC TTTQAGATAT ACCGTGTTCG GGTTGGGTAA CTCCACCTAC 1500 

GAGTTCTTCA ATGCTATTGG TAGAAAGTTT GACAGATTGT TGAGTGAGAA AGGTGGTGAC 1560 
AQATTTGCTG AATATGCTGA AGGTGACGAC GGCACTGGCA CCTTGGACGA AGATTTCATG , 1620 

GCCTGGAAGG ATAATGTCTT TGACGCCTT6 AAGAATGACT TGAACTTTGA AGAAAAGGAA 1680 

TTGAAGTACG AACCAAACGT GAAATTGACT GAGAGAGATG ACTTGTCTGC TGCCGACTCC 1740 

CAAGTTTCCT TGGGTGAGCC AAACAAGAA6 TACATCAACT CCGAGGGCAT CGACTTGACC 1800 

AA6QGTCCAT TCGACCACAC CCACCCATAC TTGGCCAGGA TCACCGAGAC CAGAGAGTTG 1860 
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TTCAGCTCCA AGGAAAGACA CTGTATTCAC GTTGAATTTG ACATTTCTGA ATCGAACTTG 1920 

AAATACACCA CCGGTGACCA TCTAGCCATC TGGCCATCCA ACTCCGACGA AAACATCAAG 1980 

CAATTT6CCA AGTGTTTCGG ATTGGAA6AT AAACTCGACA CTGTTATTGA ATTGAAGGCA 2040 

TTGQACTCCA CTTACACCAT TCCATTCCCA ACTCCAATTA CTTA066TGC T6TCATTAGA 2100 

CACX:ATTTAG AAATCTCCGG TCCAGTCTCG AGACAATTCT TTTTGTCGAT TGCTCGGTTT 2160 

GCTCCTGATG AAQAAACAAA GAAGACTTTC ACCAGACTTG GTGGTGACAA ACAAGAATTC 2220 

GCCACCAAGG TTACCCX3CAG AAAGTTCAAC AITGCCGATG CCTTGTTATA TTCCTCCAAC 2280 

AACAC7CCAT GGTCCGAT6T TCCTTTTGAG TTCCTTATTG AAAACATCCA ACACTTGACT 2340 

CCACGTTACT ACTCCATTTC TTCTTCGTCG TTGAGTGAAA AACAACTCAT CAATGTTACT 2400 

GCAGTCGTTG AGGCCGAAGA AGAAGCCGAT GGCAGACCAG TCACTGGTGT TGTTACCAAC 2460 

TTGTTGAAGA ACATTGAAAT TGCGCAAAAC AAGACTGGCG AAAAGCCACT TGTTCACTAC 2520 

GATTTGAGCG GCCCAAGAGG CAAGTTCAAC AAGTTCAAGT TGCCAGTGCA CGTGAGAAGA 2580 

TCCAACTTTA AGTTGCCAAA GAACTCCACC ACCXrCAGTTA TCTTGATTGG TCCAGGTACT 2640 

G6T6TT6CCC CATTGAGAGG TTTCGTTAGA GAAAGAGTTC AACAAGTCAA GAATGGTGTC 2700 

AATGTTGGCA AGACTTTGTT GTTTTATGGT TGCAQAAACT CCAACGAGGA CTTTTTGTAC 2760 

AAGCAAGAAT GGGCXX3AGTA CGCTTCTGTT TTGGGTGAAA ACTTTGAGAT GTTCAATGCC 2820 

TTCTCTAGAC AAGACCCATC CAAGAAGGTT TACX3TCCAGG ATAAGATTIT AGAAAACAGC 2880 

CAACTTGTGC ACGAATTGTT GACXX3AAGGT GCCATTATCT ACGTCTGTGG TQACGCCAGT 2940 

AGAATG6CCA GAGACGTCCA GACCACGATC TCCAAGATTG TTGCCAAAAG CAGAGAAATC 3000 

AGTGAAGACA AGGCCGCTGA ATT66TCAAG TCCTGGAAAG TCCAAAATAG ATACCAAGAA 3060 

GATGTTTGGT AGACTCAAAC GAATCTCTCT TTCTCCCAAC GCATTTATGA ATATTCTCAT 3120 

TGAAGTTTTA CATATGTTCT ATATTTCATT TTTTTTTTAT TATATTACGA AACATAGGTC 3180 

AACTATATAT ACTTGATTAA AT6TTATAGA AACAATAATT ATTATCTACT CGTCTACTTC 3240 

TTTGGCATTG 6CATT66CAT T6GCATTGGC ATTGCCGTTG CCGTTGQTAA T6CCGGGATA 3300 

TTTAGTACAG TATCTCCAAT CCGGATTTGA GCTATTGTAA ATCA6CT6CA AGTCATTCTC 3360 

CACCTTCAAC CAGTACTTAT ACrTCATCTT TGACTTCAAG TCCAA6TCAT AAATATTACA 3420 

AGTTAGCAAG AACTTCTGGC CATCCACAAT ATAGACGTTA TTCACGTTAT TATGCGACGT 3480 

ATGGATATGG TTATCCTTAT TGAACTTCTC AAACTTCAAA AACAACCCCA CGTCCCGCAA 3540 

C6TCATTATC AACGACAAGT TCTQACTCAC 6TC6TCGGAG CTCGTCAAGT TCTCAATTAG 3600 

ATCGTTCTTG TTATTGATCT TCTOGTACIT TCTCAACTGC TGGAACACAT TGTCCTCGTT 3660 

GTTCAAATAO ATCTTGAACA ACTTCTTGAA GG6AATCAAC TTTTC6ATCT GGGCCAAGAT 3720 

TTCC6CCGGG ATCTTCAGAA ACAAGTCCTG CAACXrCCTGQ TCX3ATGGTCT CGGGGTACAA 3780 

CAAGTCTAAG GGGCAGAAGT GTCTAGGCAC GTGTTTCAAC TGGTTCAAGG AACATGTTCX3 3840 

ACAGTAGTTC GAGTTATAOT TATCGTACAA CCACTTTGGC TTQATTTCGA AAATGACGGA 3900 

6CTGATCCCA TCATTCTCCT GGTTCCTTTC ATA6TACAAC TGGCATTTCT TC6AGAGACT 3960 

CAACTCCTCG TAGTTCCCGT CCAAGATATT CGGCAACAAG AGCCC6TAGC GCTCACGGAG 4020 

CATCAAGTCG TGGCCCTGGT TGTTCAACTT GTTGATGAAG TCCGATGTCA AGACAATCAA 4080 

CTGGATGTCG ATGATCTGGT GCGGAAACAA GTTCTTGCAC TTTAGCTCGA TGAAGTCGTA 4140 

CAACT 4145 



(2) INFORMATION FOR SEQ ID NO:83: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENOTH: 679 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNBSS : single 

(D) TOPOLOGY: unlcnown 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 

Met Ala Leu Asp Lys Leu Asp Leu Tyr val He He Thr Leu Val Val 

15 10 15 

Ala Val Ala Ala Tyr Phe Ala Lys Asn Gin Phe Leu Asp Gin Pro Oln 

20 25 30 

Asp Thr Gly Phe Leu Asn Thr Asp Ser Gly Ser Asn Ser Arg Asp Val 

35 40 45 

Leu Leu Thr Leu Lys Lys Asn Asn Lys Asn Thr Leu Leu Leu Phe Gly 
50 55 60 
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Ser 


Gin 


Thr 


Gly 


Thr 


Ala 


Glu 


Asp 


Tyr 


Ala 


Asn 


Lys 


Leu 


Ser 


Arg 


Glu 


65 










70 










75 










80 


Leu 


His 


Ser 


Arg 


Phe 


Gly 


Leu 


Lys 


Thr 


Met 


Val 


Ala 


Asp 


Phe 


Ala 


Asp 










85 










90 










95 




Tyr 


Asp 


Trp 


Asp 


Asn 


Phe 


Gly 


Asp 


He 


Thr 


Glu 


Asp 


He 


Leu 


Val 


Phe 








100 










105 










110 






Phe 


lie 


Val 


Ala 


Thr 


Tyr 


Gly 


Glu 


Gly 


Glu 


Pro 


Thr 


Asp 


Asn 


Ala 


Asp 






115 










120 










125 








Glu 


Phe 


His 


Thr 


Trp 


Leu 


Thr 


Glu 


Glu 


Ala 


Asp 


Thr 


Leu 


Ser 


Thr 


Leu 




130 










135 










140 










Lys 


Tyr 


Thr 


Val 


Phe 


Gly 


Leu 


Gly 


Asn 


Ser 


Thr 


Tyr 


Glu 


Phe 


Phe 


Asn 


145 










150 










155 










160 


Ala 


lie 


Gly 


Axrg 


Lys 


Phe 


Asp 


Arg 


Leu 


Leu 


Ser 


Glu 


Lys 


Gly 


Gly 


Asp 










165 










170 










175 




Arg 


Phe 


Ala 


Glu 


Tyr 


Ala 


Glu 


Gly 


Asp 


Asp 


Gly 


Thr 


Gly 


Thr 


Leu 


Asp 








180 










185 










190 






Glu 


Asp 


Phe 


Met 


Ala 


Trp 


Lys 


Asp 


Asn 


Val 


Phe 


Asp 


Ala 


Leu 


Lys 


Asn 






195 










200 










205 








Asp 


Leu 


Asn 


Phe 


Glu 


Glu 


Lys 


Glu 


Leu 


Lys 


Tyr 


Glu 


Pro 


Asn 


Val 


Lys 




210 










215 










220 










Leu 


Thr 


Glu 


Arg 


Asp 


Asp 


Leu 


Ser 


Ala 


Ala 


Asp 


Ser 


Gin 


Val 


Ser 


Leu 


225 










230 










235 










240 


Gly 


Glu 


Pro 


Asn 


Lys 


Lys 


Tyr 


He 


Asn 


Ser 


Glu 


Gly 


He 


Asp 


Leu 


Thr 










245 










250 










255 




Lys 


Gly 


Pro 


Phe 


Asp 


His 


Thr 


His 


Pro 


Tyr 


Leu 


Ala 


Arg 


He 


Thr 


Glu 








260 










265 










270 






Thr 


Arg 


Glu 


Leu 


Phe 


Ser 


Ser 


Lys 


Asp 


Arg 


His 


Cys 


He 


His 


Val 


Glu 






275 










280 










285 








Phe 


Asp 


He 


Ser 


Glu 


Ser 


Asn 


Leu 


Lys 


Tyr 


Thr 


Thr 


Gly 


Asp 


His 


Leu 




290 










295 










300 










Ala 


lie 


Trp 


Pro 


Ser 


Asn 


Ser 


Asp 


Glu 


Asn 


He 


Lys 


Gin 


Phe 


Ala 


Lys 


305 










310 










315 










320 


Cys 


Phe 


Gly 


Leu 


Glu 


Asp 


Lys 


Leu 


Asp 


Thr 


Val 


He 


Glu 


Leu 


Lys 


Ala 










325 










330 










335 




Leu 


Asp 


Ser 


Thr 


Tyr 


Thr 


He 


Pro 


Phe 


Pro 


Thr 


Pro 


He 


Thr 


Tyr 


Gly 








340 










345 










350 






Ala 


Val 


He 


Arg 


His 


His 


Leu 


Glu 


He 


Ser 


Gly 


Pro 


Val 


Ser 


Arg 


Gin 






355 










360 










365 








Phe 


Phe 


Leu 


Ser 


He 


Ala 


Gly 


Phe 


Ala 


Pro 


Asp 


Glu 


Glu 


Thr 


Lys 


Lys 




370 










375 










380 










Ala 


Phe 


Thr 


Arg 


Leu 


Gly 


Gly 


Asp 


Lys 


Gin 


Glu- 


Phe 


Ala 


Ala 


Lys 


Val 


385 










390 










395 










400 


Thr 


Arg 


Arg 


Lys 


Phe 


Asn 


He 


Ala 


Asp 


Ala 


Leu 


Leu 


Tyr 


Ser 


Ser 


Asn 










405 










410 










415 




Asn 


Ala 


Pro 


Trp 


Ser 


Asp 


Val 


Pro 


Phe 


Glu 


Phe 


Leu 


He 


Glu 


Asn 


Val 








420 










425 










430 






Pro 


His 


Leu 


Thr 


Pro 


Arg 


Tyr 


Tyr 


Ser 


He 


Ser 


Ser 


Ser 


Ser 


Leu 


Ser 






435 










440 










445 








Glu 


Lys 


Gin 


Leu 


He 


Asn 


Val 


Thr 


Ala 


Val 


Val 


Glu 


Ala 


Glu 


Glu 


Glu 




450 










455 










460 










Ala 


Asp 


Gly 


Arg 


Pro 


Val 


Thr 


Gly 


Val 


Val 


Thr 


Asn 


Leu 


Leu 


Lys 


Asn 


465 










470 










475 










480 


Val 


Glu 


He 


Val 


Gin 


Asn 


Lys 


Thr 


Gly 


Glu 


Lys 


Pro 


Leu 


Val 


His 


Tyr 










485 










490 










495 




Asp 


Leu 


Ser 


Gly 


Pro 


Arg 


Gly 


Lys 


Phe 


Asn 


Lys 


Phe 


Lys 


Leu 


Pro 


Val 








500 










505 










510 
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His 


Val 


Arg 
515 


Arg 


ser 


Asn 


Phe 


Lys 
520 


Xjeu 


Pro 


Lys 


Asn 


525 


xnr 


Thr 




Val 


lie 
530 


Leu 


xie 


Giy 


Pro 


Gly 
535 


Thr 


Gly 


va± 


Axa 


iriO 

540 


lieu 


Arg 


laxy 


irlXtS 


Val 


Arg 


Glu 


Arg 


vai 


Gin 


Gin 


Val 


Lys 


Asn 


A1 mm 

Qiy 


vax 


Asn 


vax 


fit xr 

lixy 


Lys 


545 










550 










555 










560 


Thr 


Leu 


Leu 


Pne 


Tyr 


Gly 


Cys 


Arg 


Asn 


ser 


Asn 


uXU 


Asp 


irXie 


Leu 


Tyr 


Lys 


Gin 


Glu 


Trp 
580 


Ala 


Glu 


Tyr 


m t _ 
Ala 


Ser 
585 


vai 


lieu 


tixy 


uXU 


Asn 
590 


jrne 


ill 11 

UXU 


Met 


Phe 


Asn 

595 


Ala 


Pne 


Ser 


Arg 


Gin 
600 


Asp 


Pro 


ser 


Lys 


Lys 
605 


vax 


Tyr 


vax 


Gin 


Asp 

CI n 
oxu 


Lys 


lie 


Leu 


Glu 


Asn 


Ser 


Gin 


Leu 


vajL 


m a 

OA V 


^XU 


I«eu 


Leu 


Thr 


Glu 


Gly 


Ala 


He 


He 


Tyr 


Val 


Cys 


Gly 


Asp 


Ala 


Ser 


Arg 


Met 


Ala 


Acg 


625 










630 










635 










640 


Asp 


Val 


Gin 


Thr 


Thr 
645 


He 


Ser 


Lys 


He 


Val 
650 


Ala 


Lys 


Ser 


Arg 


Glu 
655 


He 


Ser 


Glu 


Asp 


Lys 
660 


Ala 


Ala 


Glu 


Leu 


Val 
665 


Lys 


Ser 


Trp 


Lys 


Val 
670 


Gin 


Asn 


Arg 


Tyr 


Gin 


Glu 


Asp 


Val 


Trp 





















675 



(2) INFORMATION FDR SEQ ID NO: 84: 
(1) SEQfDENCB CEARACTERISTICS ; 

(A) LENGTH: 679 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : s ingle 

(D) TOPOLOGY: tinknown 
(ii) MOLBCOZiE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 



Met 


Ala 


Leu 


Asp 


Lys 


Leu 


Asp 


Leu Tyr 


Val 


He 


He 


Thr 


Leu 


Val 


Val 


1 








5 








10 










15 




Ala 


Val 


Ala 


Ala 
20 


Tyr 


Phe 


Ala 


Lys Asn 
25 


Gin 


Phe 


Leu 


Asp 


Gin 
30 


Pro 


Gin 


Asp 


Thr 


Gly 
35 


Phe 


Leu 


Asn 


Thr 


Asp Ser 
40 


Gly 


Ser 


Asn 


Ser 
45 


Arg 


Asp 


Val 


Leu 


Leu 
50 


Thr 


Leu 


Lys 


Lys 


Asn 
55 


Asn Lys 


Asn 


Thr 


Leu 
60 


Leu 


Leu 


Phe 


Gly 


Ser 


Gin 


Thr 


Gly 


Thr 


Ala 


Glu 


Asp Tyr 


Ala 


Asn 


Lys 


Leu 


Ser 


Arg 


Glu 


65 










70 








75 










80 


Leu 


His 


Ser 


Arg 


Phe 
85 


Gly 


Leu 


Lys Thr 


Met 
90 


Val 


Ala 


Asp 


Phe 


Ala 
95 


Asp 


Tyr 


Asp 


Trp 


Asp 
100 


Asn 


Phe 


Gly 


Asp He 
105 


Thr 


Glu 


Asp 


He 


Leu 
110 


val 


Phe 


Phe 


He 


Val 
115 


Ala 


Thr 


Tyr 


Gly 


Glu Gly 
120 


Glu 


Pro 


Thr 


Asp 
125 


Asn 


Ala 


Asp 


Glu 


Phe 
130 


His 


Thr 


Trp 


Leu 


Thr 
135 


Glu Glu 


Ala 


Asp 


Thr 
140 


Leu 


Ser 


Thr 


Leu 


Arg 


Tyr 


Thr 


Val 


Phe 


Gly 


Leu 


Gly Asn 


Ser 


Thr 


Tyr 


Glu 


Phe 


Phe 


Asn 


145 










150 








155 










160 


Ala 


He 


Gly 


Arg 


Lys 
165 


Phe 


Asp 


Arg Leu 


Leu 
170 


Ser 


Glu 


Lys 


Gly 


Gly 
175 


Asp 


Arg 


Phe 


Ala 


Glu 
180 


Tyr 


Ala 


Glu 


Gly Asp 
185 


Asp 


Gly 


Thr 


Gly 


Thr 
190 


Leu 


Asp 
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Glu Asp Phe 


Met 


Ala 


Trp 


Lys 


Asp 


Asn 


Val 


Phe 


Asp 


Ala 


Leu 


Lys 


Asn 


195 










200 










205 








Asp Leu Asn 


Phe 


Glu 


Glu 


Lys 


Glu 


Leu 


Lys 


Tyr 


Glu 


Pro 


Asn 


Val 


Lys 


210 








215 










220 










Leu Thr Glu 


Arg 


Asp 


Asp 


Leu 


Ser 


Ala 


Ala 


Asp 


Ser 


Gin 


Val 


Ser 


Leu 


225 






230 










235 










240 


Gly Glu Pro 


Asn 


Lys 
245 


Lys 


Tyr 


He 


Asn 


Ser 
250 


Glu 


Gly 


He 


Asp 


Leu 
255 


Thr 


Lys Gly Pro 


Phe 
260 


Asp 


His 


Thr 


His 


Pro 
265 


Tyr 


Leu 


Ala 


Arg 


He 
270 


Thr 


Glu 


Thr Arg Glu 


Leu 


Phe 


Ser 


Ser 


Lys 


Glu 


Arg 


His 


Cys 


He 


His 


Val 


Glu 


275 










280 










285 








Phe Asp lie 


Ser 


Glu 


Ser 


Asn 


Leu 


Lys 


Tyr 


Thr 


Thr 


Gly 


Asp 


His 


Leu 


290 








295 










300 










Ala lie Trp 


Pro 


Ser 


Asn 


Ser 


Asp 


Glu 


Asn 


He 


Lys 


Gin 


Phe 


Ala 


Lys 


305 






310 










315 










320 


Cys Phe Gly 


lieu 


Glu 
325 


Asp 


Lys 


Leu 


Asp 


Thr 
330 


Val 


He 


Glu 


Leu 


Lys 
335 


Ala 


Leu Asp Ser 


Thr 
340 


Tyr 


Thr 


He 


Pro 


Phe 
345 


Pro 


Thr 


Pro 


He 


Thr 

350 


Tyr 


Gly 


Ala Val He 


Arg 


His 


His 


Leu 


Glu 


He 


Ser 


Gly 


Pro 


Val 


Ser 


Arg 


Gin 


355 










360 










365 








Phe Phe Leu 


Ser 


He 


Ala 


Gly 


Phe 


Ala 


Pro 


Asp 


Glu 


Glu 


Thr 


Lys 


Lys 


370 








375 










380 










Thr Phe Thr 


Arg 


Leu 


Gly 


Gly 


Asp 


Lys 


Gin 


Glu 


Phe 


Ala 


Thr 


Lys 


Val 


385 






390 










395 










400 


Thr Arg Arg 


Lys 


Phe 
405 


Asn 


He 


Ala 


Asp 


Ala 
410 


Leu 


Leu 


Tyr 


Ser 


Ser 
415 


Asn 


Asn Thr Pro 


Trp 
420 


Ser 


Asp 


Val 


Pro 


Phe 
425 


Glu 


Phe 


Leu 


He 


Glu 
430 


Asn 


He 


Gin His Leu 


Thr 


Pro 


Arg 


Tyr 


Tyr 


Ser 


He 


Ser 


Ser 


Ser 


Ser 


Leu 


Ser 


435 










440 










445 








Glu Lys Gin 


Leu 


He 


Asn 


Val 


Thr 


Ala 


Val 


Val 


Glu 


Ala 


Glu 


Glu 


Glu 


450 








455 










460 










Ala Asp Gly 


Arg 


Pro 


Val 


Thr 


Gly 


Val 


Val 


Thr 


Asn 


Leu 


Leu 


Lys 


Asn 


465 






470 










475 










480 


He Glu He 


Ala 


Gin 
485 


Asn 


Lys 


Thr 


Gly 


Glu 
490 


Lys 


Pro 


Leu 


Val 


His 
495 


Tyr 


Asp Ijeu Ser 


Gly 
500 


Pro 


Arg 


Gly 


Lys 


Phe 
505 


Asn 


Lys 


Phe 


Lys 


Leu 
510 


Pro 


Val 


His Val Arg 


Arg 


Ser 


Asn 


Phe 


Lys 


Leu 


Pro 


Lys 


Asn 


Ser 


Thr 


Thr 


Pro 


515 










520 










525 








Val He Leu 


He 


Gly 


Pro 


Gly 


Thr 


Gly 


Val 


Ala 


Pro 


Leu 


Arg 


Gly 


Phe 


530 








535 










540 










Val Arg Glu 


Arg 


Val 


Gin 


Gin 


Val 


Lys 


Asn 


Gly 


Val 


Asn 


Val 


Gly 


Lys 


545 






550 










555 










560 


Thr Leu Leu 


Phe 


Tyr 
565 


Gly 


Cys 


Arg 


Asn 


Ser 
570 


Asn 


Glu 


Asp 


Phe 


Leu 
575 


Tyr 


Lys Gin Glu 


Trp 
580 


Ala 


Glu 


Tyr 


Ala 


Ser 
585 


Val 


Leu 


Gly 


Glu 


Asn 
590 


Phe 


Glu 


Met Phe Asn 


Ala 


Phe 


Ser 


Arg 


Gin 


Asp 


Pro 


Ser 


Lys 


Lys 


Val 


Tyr 


Val 


595 










600 










605 








Gin Asp Lys 


He 


Leu 


Glu 


Asn 


Ser 


Gin 


Leu 


Val 


His 


Glu 


Leu 


Leu 


Thr 


610 








615 










620 










Glu Gly Ala 


He 


He 


Tyr 


Val 


Cys 


Gly 


Asp 


Ala 


Ser 


Arg 


Met 


Ala 


Arg 


625 






630 










635 










640 
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Asp Val Gin Thr Thr He Ser Lys He Val Ala Lys Ser Arg Glu lie 

645 650 655 

Ser Glu Asp Lys Ala Ala Glu Leu Val Lys Ser Trp Lys Val. Gin Asn 

660 665 670 

Arg Tyr Gin Glu Asp Val Trp 
675 

(2) INFORMATION FOR SEQ ID NO:85: 
(i) SEQX7ENCB CHARACTERISTICS: 

(A) LENGTH: 4115 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 

CATATGCGCT AATCTTCTTT TTCTTTTTAT CACAGGAGAA ACTATCCCAC CCCCACTTCG 60 

AAACACAATG ACAACTCCTG C6TAACTTGC AAATTCTPGT CTGACTAATT GAAAACTCCG 120 

GAOGA6TCAG ACCTCCAGTC AAACGGACA6 ACAGACAATU: ACTTGGTGCG ATGTTCATAC 180 

CTACAGACAT GTCAACGGGT GTTAGACGAC GGTTTCTTGC AAAGACAGGT GTTGGCATCT 240 

CGTACGATGG CAACTGCAGG AGGTGTCGAC TTCTCCTTTA GGCAATAGAA AAAGACTAAG 300 

AGAACAGCGT TTTTACAGGT TGCATTGGTT AATGTAGTAT TTTTTTAGTC CCAGCATTCT 360 

GTGGGTTGCT CTGGGTTTCT AGAATAGGAA ATCACAGGAG AATGCAAATT CAGATGGAAG 420 

AACAAAGAGA TAAAAAACAA AAAAAAACTG AGTTTTGCAC CAATAGAATG TTTGATGATA 480 

TCATCCACTC GCTAAACGAA TCATGTGGGT GATCTTCTCT TTAGTTTTGG TCTATCATAA 540 

AACACATGAA AGTGAAATCC AAATACACTA CACTCCGGGT ATTGTCCTTC GTTTTACAGA 600 

TGTCTCATTG TCTTACTTTT GAGGTCATAG GAGTTGCCTG TGAGAGATCA CAGAGATTAT 660 

CACACTCACA TTTATCCTAG TTTCCTATCT CATGCTGTGT GTCTCTGGTT GGTTCATGAG 720 

TTTG6ATTGT TGTACATTAA AGGAATCGCT GGAAAGGAAA GCTAACTAAA TTTTCTTTGT 780 

CACAGGTACA CTAACCTGTA AAACTTCACT GCCACGCCAG TCTTTCCTGA TTGGGCAAGT 840 

GCACAAACTA CAACCTGCAA AACAGCACTC CGCTTGTCAC AGGTTGTCTC CTCTCAACCA 900 

ACAAAAAAAT AAGATTAAAC TTTCTTTGCT CATGCATCAA TCGGAGTTAT CTCTGAAAGA 960 

GTTGCCTTTG TGTAATGTGT GCCAAACTCA AACTGCAAAA CTAACCACAG AATGATTTCC 1020 

CTCACAATTA TATAAACTCA CCCACATTTC CACAGACCGT AATTTCATCT CTCACTTTCT 1080 

CTTTTOCTCT TCTTTTACTT AGTCAGGTTT GATAACTTCC TTTTTTATTA CCCTATCTTA 1140 

TTTATTTATT TATTCATTTA TACCAACCAA CCAACCATGG CCACACAAGA AATCATCGAT 1200 

TCTGTACTTC CGTACTTGAC CAAATGGTAC ACTGTGATTA CTGCAGCAGT ATTAGTCTTC 1260 

CTTATCTCCA CAAACATCAA GAACTACGTC AAGGCAAAGA AATTGAAATG TGTCGATCCA 1320 

CCATACTTGA AGGATGCCGG TCTCACTGGT ATTCTGTCTT TGATCGCCGC CATCAAGGCC 1380 

AAGAACGACG GTAGATTGGC TAACTTT6CC QATGAAGTTT TCGACGAGTA CCCAAACCAC 1440 

ACCTTCTACT TGTCTGTTGC CGGTGCTTTG AAGAtTGTCA TGACTGTTGA CCCAGAAAAC 1500 

ATCAAGGCTG TCTTGGCCAC CCAATTCACT GACTTCTCCT TGGGTACCAG ACACGCCCAC 1560 

TTTGCTCCTT TGTTGGGTGA CGGTATCTTC ACCTTGGACG GAGAAGGTTG GAAGCACTCC 1620 

AGAGCTATGT TGAGACCACA GTTTGCTAGA GACCAGATTG GACAOGTTAA AGCCTTGGAA 1680 

CCACACATCC AAATCAT6GC TAAGCAGATC AAGTTGAACC AGGGAAAGAC TTTOGATATC 1740 

CAAGAATTGT TCTTTAGATT TACCOTCGAC ACGGCTACTG AGTTCTTGTT TGGTGAATCC 1800 

GTTCACTCCT TGTACGATGA AAAATTGGGC ATCCCAACTC CAAACGAAAT CCCAGGAAGA 1860 

GAAAACTTTG CCGCTGCTTT CAACGTTTCC CAACACTACT TGGCCACCAG AAGTTACTCC 1920 

CAGACTTTTT ACTTTTTGAC CAACCCTAAG GAATTCAGAG ACTGTAACGC CAAGGTCCAC 1980 

CACTTGGCCA A6TACTTT6T CAACAAGGCC TTGAACTTTA CTCCTGAAGA ACTCGAAGAG 2040 

AAATCCAAGT CCGGTTACGT TTTCTTGTAC GAATTGGTTA AGCAAACCAG AGATCCAAAG 2100 

GTCTTGCAAG ATCAATTGTT GAACATTATG GTTGCCGGAA GAGACACCAC TGCCGGTTTG 2160 

TTGTCCTTTG CTTTGTTTGA ATTGGCTAGA CACCCAGAGA TGTGGTCCAA GTTGAGAGAA 2220 

GAAATCGAAG TTAACTTTGG TGTTGGTGAA GACTCCCGCG TTGAAGAAAT TACCTTCGAA 2280 

GCCTTGAAGA GATGTGAATA CTTGAAGGCT ATCCTTAACG AAACCTTGCG TATGTACCCA 2340 

TCTGTTCCTG TCAACTTTAG AACCGCCACC AGAGACACCA CTTTGCCAAG AGGTGGTGGT 2400 

GCTAACGGTA CCQACCCAAT CTACATTCCT AAAGGCTCCA CTGTTGCTTA CGTT6TCTAC 2460 
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AAGACCCACC GTTTGGAAGA ATACTACGGT AA6GACGCTA ACGACTTCA6 ACCA6AAA6A 2520 

T6GTTTGAAC CATCTACTAA GAAGTTGGGC TGGGCTTATG TTCCATTCAA CX3GTG6TCCA 2580 

AGAGTCT6CT TGGGTCAACA ATTC6CCTTG ACTGAA6CTT CTTAT6TGAT CACTAGATTG 2640 

GCCCAOATGT TTGAAACT6T CTCATCTGAT CCAGGTCTCG AATACCCTCC ACCAAAGT6T 2700 

ATTCACTTGA CCATGAGTCA CAACGATGGT GTCTTTGTCA AGATGTAAAG TAGTCGATGC 2760 

TGGGTATTCG ATTACATGTG TATAGGAAGA TTTTGGTTTT TTATTCGTTC TTTTTTTTAA 2820 

TTTTTGTTAA ATTAGTTTAO AGATTTCATT AATACATAGA TG6GTGCTAT TTCCGAAACT 2880 

TTACi'rCTAT CCCCTGTATC CCTTATTATC CCTCTCAGTC ACATGATTGC TGTAATTGTC 2940 

6T6CA6GACA CAAACTCCCT AACGGACTTA AACCATAAAC AAGCTCAGAA CCATAA6CC6 3000 

ACATCACTCC TTCTTCTCTC TTCTCCAACC AATAGCATGG ACAGACCCAC CCTCCTATCC 3060 

GAATCGAAGA CCCTTATTGA CTCCATACCC ACCTGGAAGC CCCTCAAGCC ACACACGTCA 3120 

TCCA6CCCAC CCATCACCAC ATCCCTCTAC TCGACAACGT CCAAAGACGG CGAGTTCTGG 3180 

TGT6CCC6GA AATCAGCCAT CXXX3GCCACA TACAA6GAGC CGTTGATTGC GTGCATACTC 3240 

G6CGA6CCCA CAATGGGAGC CACGCATTCG GACXATGAAG CAAAGTACAT TCACX3AGATC 3300 

ACGGGTGTTT CAGTGTCGCA GATTGA6AAG TTCGACGATG GATGGAAGTA CGATCTCGTT 3360 

GCGGATTACG ACTTCGGTGG GTTGTTATCT AAACGAAGAT TCTATGAGAC GCAGGATGTQ 3420 

TTTCGOTTC6 AGGATT6TGC GTACGTCAIG AGTGTGCCTT TT6ATGGACC CAAGGAGGAA 3480 

GGTTACGTGG TTGGGAC6TA CAGATCCATT GAAAGGTT6A GCTGGG6TAA AGACGGGGAC 3540 

GTGGAGTGGA CCATGGCGAC GACGTCGGAT CCTGGTGGGT TTATCCCGCA ATGGATAACT 3600 

CGATTGAGCA TCCCTGGAGC AATCGCAAAA GATGTGCCTA GTGTATTAAA CTACATACAG 3660 

AAATAAAAAC GTGTCTTGAT TCATTGGTTT GGTTCTTGTT GGGTTCCGAG CCAATATTTC 3720 

ACATGATCTC CTAAATTCTC CAAGAATCCC AACGTAGCGT AGTCCAGCAC GCCCTCTGAG 3780 

ATCrTATTTA ATATCGACTT CTCAACCACC GGTGGAATCC CGTTCAGACC ATTGTTACCT 3840 

GTAGTGTGTT TGCTCTTGTT CTTGATGACA ATGATGTATT TGTCACGATA CCTGAAATAA 3900 

TAAAACATCC AGTCATTGAG CTTATTACTC GTGAACTTAT GAAAGAACTC ATTCAAGCCG 3960 

TTCCCAAAAA ACCC3^GAATT GAAGATCTTG CTCAACTGGT CATGCAAGTA GTAGATCX3CC 4020 

ATGATCTGAT ACTTTACCAA GCTATCCTCT CCAAGTTCTC CCACGTACGG CAAGTACG6C 4080 

AACGAGCTCT GGAAGCTTTG TTGTTTGGGG TCATA 4115 

(2) INFORMATION FOR SEQ ID NO: 86: 
<i) SEQX7ENCE CHARACTERISTICS: 

(A) LENGTH: 3948 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 

AGTCTCCAAG TTGACCGACG CCCAAGTCAT 60 

GATGGCTGGT ACTGAAGAAG GTGTCACGGA 120 

TCAACCATTC TTGGT6TTGC ACCCAATGAA 180 

GCAACACAAG GCTAACGCCT GGTT G TT G AA 240 

AGCTGGTAAG AGATGCTCAT TGAAQTACAC 300 

TGAATTGTCC AAGGTTGAAT ACGAAACTTT 360 

CTGTCCAG6T GTCCCAAGT6 AAATCTTGAA 420 

ACTCCTTCAA CAAGGAAATC AAGTCTTT6G 480 

ATGCTGACCA AGCTACCGCT GAAGTGAGAG 540 

TCATTATTTA GTTTGCCTAT TTATTTCTCA 600 

AGTTACTTCG GATATCATTG TAATCGTGCG 660 

TGAAACGGAT TCATGCACGA AGCGGAGATA 720 

TTTTAGCCGT GTTCACACGC CCTTCTTTGT 780 

ACA6CTCATT CTAATTTCCG TCACGCGAAT 840 

GTGGGGGCAG TAAACGCAGT CTCTCCTCTC 900 

GGATAGAAA6 CGGAATGCGA GGAAAATTTT 960 

CCAGGTTTTG AGCCAGGGAA CACACTCCTA 1020 

CACCAAGACG CAATGAAACG CACATGGACA 1080 

TTAACAGAAA A6TATAATAA GAACCCAT6C 1140 



GACCTGTGAC GCTTCCGGTG TCTTGCCACC 
GTACCACTTT ATTTCCGGTT ACACTTCCAA 
ACCACAAGCT ACTTTCTCCG CTTGTTTCGG 
6TACGCTCAA CAATTGT C TG ACAAGATCTC 
CACC6GTTGG GTTGGTTCTT CT6CTGCTAG 
CAGAGCCATT TTGGAOGCTA TCCACTCTGG 
CCCAGTCTTC AACTTGAATG TCCCAACCTC 
CCCAACCAAG GCCTGGACCG GAAGGTGTTG 
CTGGTAAGTT TGCT6AAAAC TTCAAGACCT 
CT6CAGGTCC AGAAGCTTAA AGATATTTAT 
TTACCCATCA TCATTCAACA CTATATATAA 
TGTCGCAATT GGATGATTTG GAACTGCGCT 
AAAGATTACG TAATTTATCT CCTGAGACAA 
TCTGAGCGAA GGATAAATAA TTAGACTTCC 
ATTGAAGGGG GGTACATGTG 6CCGCTGAAT 
CCAGGAATAG TGCAACGGAG GAAGGATAAC 
GAACGCGCAA GAAAAGCAAT ATCCGGGCTA 
TTTCTGCTCA ATGACTGAAC ATAGAAAAAA 
TTTAGACCTC CCCACATGTG ATAGTTTGTC 
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CGTCCCTTTT CTTTCGCCGC TTCAACTTTT TTTTTTTTAT CTTACACACA TCACGACCAT 1200 

GACTGTACAC GATATTATCG CCACATACTT CACCAAATGG TACGTGATAG TACCACTCGC 1260 

TTTGATTGCT TATAGA6TCC TCGACTACTT CTATGGCAGA TACTTGATGT ACAAGCTTGG 1320 

TGCTAAACCA TTTTTCCAGA AACAGACAGA CGGCTGTTTC GGATTCAAAG CTCCGCTTQA 1380 

ATTGTTGAAG AAGAAGAGCG ACGGTACCCT CATAGACTTC ACACTCCAGC GTATCCACGA 1440 

TCTCGATCGT CCCGATATCC CAACTTTCAC ATTCCCGGTC TTTTCCATCA ACCTTGTCAA 1500 

TACCCTTGA6 CXrGGAGAACA TCAAGGCCAT CTTGGCCACT CAGTTCAACG ATTTCTCCTT 1560 

GGGTACCAGA CACTG6CACT TIGCTCCTTT GTTGGGTGAT GGTATCTTTA C6TT6GAT66 1620 

C6C066CTGG AAGCACAGCA GATCTAT6TT GAOACCACAG TTTGCCAGA6 AACAGATTTC 1680 

CCACGTCAAG TTGTTGGAGC CACACGTTCA GGTGTTCTTC AAACACGTCA GAAAGGCACA 1740 

GGGCAAGACT TTTGACATCC AGGAATTGTT TTTCAGATTG ACCGTCGACT CCGCCACCX3A 1800 

G ' rrrri ' U ' rri ' ggtgaatccg ttgagtcctt gagagatgaa tctatcggca tgtccatcaa i860 

TGOGCTTGAC TTTGACGGCA AGGCTGGCTT TGCTGATGCT TTTAACTATT CGCAGAATTA 1920 

TTTGGCTTCG AGA6CGGTTA TGCAACAATT GTACTGGGTG TTGAACGGGA AAAAGTTTAA 1980 

GGAGTGCAAC GCTAAAGTGC ACAAGTTTGC TGACTACTAC GTCAACAAGG CTTTGGACTT 2040 

GACGCCTGAA CAATTGGAAA AGCAGGATGG TTATGTGTTT TTGTACGAAT TGGTCAAGCA 2100 

AACCAGAGAC AAGCAAGTGT TGA6AGACCA ATTGTTGAAC ATCATGGTTG CTGGTAGAGA 2160 

CACXaCCGCC GGTTTGTTGT CGTTTGTTTT CTTTGAATTG GCCAGAAACC CAGAAGTTAC 2220 

CAACAA6TT6 AGA6AAGAAA TTGAGGACAA GTTTGGACTC GGTGAGAAT6 CTAGTGTTGA 2280 

AGACATTTCC TTTGAGTCGT TGAAGTCCTG TGAATACTTG AAGGCTGTTC TCAACGAAAC 2340 

CTTGAGATTG TACCCATCCG TGCCACAGAA TTTCAGAGTT GCCACCAAGA ACACTACCCT 2400 

CCCAAGAGGT GGTGGTAAGG ACGGGTTGTC TCCTGTTTTG GTGAGAAAGG GTCAGACCGT 2460 

TATTTACGGT GTCTACGCAG CCCACAGAAA CCCAGCT6TT TACGCTAAGG ACGCTCTTGA 2520 

6TTTAGACCA 6AGA6ATGGT TTGAGCCAGA GACAAAGAAG CTIGGCTGGG CCTTCCTCCC 2560 

ATTCAACGGT GGTCCAAGAA TCTGTTTGGG ACAGCAGTTT GCCTTGACAG AAGCTTCGTA 2640 

TGTCACTGTC AGGTTGCTCC AGGAGTTTGC ACACTTCTCT ATGGACCCAG ACACCGAATA 2700 

TCCACCTAA6 AAAATGTCGC ATTTGACCAT GTCGCTTTTC GACGGTGCCA ATATTGAGAT 2760 

6TATTAGAGG GTCATGTGTT ATTTTGATTG TTTAGTTTGT AATTACT6AT TAGGTTAATT 2820 

CATGGATTGT TATTTATTGA TAGG6GTTT0 CGCGTGTTGC ATTCACTTGG GATCGTTCCA 2880 

GGTTGATGTT TCCTTCCATC CTGTCGAGTC AAAAGGA6TT TTGTTTTGTA ACTCCGGACG 2940 

AT6TTTTAAA TA6AAG6TC6 ATCTCCATGT GATT6TTTTG ACT6TTACTG TGATTATGTA 3000 

ATCTGCGGAC GTTATACAAG CATGTGATTG TGGTTTTGCA GCCTTTTGCA CGACAAATGA 3060 

TCX3TCAGACG ATTACGTAAT CTTTGTTAGA 6GGGTAAAAA AAAACAAAAT GGCAGCCAGA 3120 

ATTTCAAACA TTCT6CAAAC AAT6CAAAAA AT6GGAAACT CCAACAGACA AAAAAAAAAA 3180 

CTCCX3CAGCA CTCC6AACCC ACAOAACAAT GGGGC6CCAG AATTATTOAC TATTGTQACT 3240 

TTTTTAC6CT AACGCTCATT GCAGTGTAGT 6C6TCTTACA C6GG6TATTG CTTTCTACAA 3300 

TGCAAGGGCA CAGTTGAAGG TTTGCACCTA ACGTT6CCCC GTGTCAACTC AATTTGACGA 3360 

6TAACTTCCT AAGCTCGAAT TATGCAGCTC GTGCGTCAAC CTATGTGCAG GAAAGAAAAA 3420 

ATCCAAAAAA ATOGAAAATG C6ACTTTCGA TTTTGAATAA ACCAAAAAGA AAAATGTCGC 3480 

ACTTTTTTCT CGCTCTCGCT CTCTCGACCC AAATCTkCAAC AAATCCTCGC GCGCAGTATT 3540 

TC6ACGAAAC CACAACAAAT AAAAAAAACA AATTCTACAC CACTTCTTTT TCTTCACCAG 3600 

TCAACAAAAA ACAACAAATT ATACACCATT TCAACGATTT TTGCTCTTAT AAATGCTATA 3660 

TAATGGTTTA ATTCAACTCA GGTATGTTTA TTTTACTGTT TTCAGCTCAA GTATGTTCAA 3720 

ATACTAACTA CTTTTGATGT TTGTCGCTTT TCTAGAATCA AAACAACGCC CACAACACGC 3780 

06A6CTT6TC GAATAGACGG TTT6TTTACT CATTAGATGG TCCCAGATTA CTTTTCAAGC 3840 

CAAAGTCTCT CGAGTTTTGT TTGCTGTTTC CCCAATTCCT AACTATQAAG 6GTTTTTATA 3900 

AOGTCCAAAG ACCCCAAGGC ATA6TTTTTT TGGTTCCTTC TTGTCGTG 3948 

(2) INFORMATION FOR SEQ ID NO: 87: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3755 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOIiOGY: linear 

(ii) MOLECULE TYPE: UNA (gexK>mic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:87: 

GCTCAACAAT TGTCTGACAA GATCTCGCAA CACAAGGCTA ACGCCTGGTT GTTGAACACT 60 

GGTTGG6TTG GTTCTTCTGC TGCTAGAGGT GGTAAGAGAT GTTCATTGAA 6TACACCA6A 120 

6CCATTTTG6 A06CTATCCA CTCT6GTGAA TTGTCCA/yGG TTGAATAC6A GACTTTCCCA 180 

GTCTTCAACT TGAATGTCCC AACCTCCTGC CCA6GTGTCC CAAGTGAAAT CTTGAACCCA 240 

ACCAAGGCCT GGACCGAAGG TGTTGACTCC TTCAACAAGG AAATCAAGTC TTTGGCTGGT 300 

AAGTTTGCTG AAAACTTCAA GACCTATGCT GACCAAGCTA CCGCTGAAGT TAGAGCTGCA 360 

GGTCCAGAAG CTTAAAGATA TTTATTCACT ATTTA6TTTG CCTATTTATT TCTCATCACC 420 

CATCATCATT CAACAATATA TATAAAGTTA TTTCGGAACT CATATATCAT T6TAAT0GTG 480 

CGTGTTGCAA TTGGGTAATT TGAAACTGTA GTTGGAACGG ATTCATGCAC GAT6CGGAGA 540 

TAACACGAGA TTATCTCCTA AGACAATTTT GGCCTCATTC ACACGCCCTT CTTCTGAGCT 600 

AAGGATAAAT AATTAGACTT CACAAGTTCA TTAAAATATC CGTCACGCGA AAACTGCAAC 660 

AATAA6GAAO G6GGGGGTAG AC6TAGCCGA TGAATGTGGG 6TGCCAGTAA ACGCAGTCTC 720 

TCTCTCCCCC CCCCCCCCCC CCCCCTCAGG AATA6TACAA CGG6GGAAGG ATAACGGATA 780 

GCAAGTGGAA TGCGAGGAAA ATTTTGAATG CGCAAGGAAA GCAATATCCG GGCTATCAGG 840 

TTTTGAGCCA GGGGACACAC TCCTCTTCTG CACAAAAACT TAACGTAGAC AAAAAAAAAA 900 

AACTCCACCA AGACACAAT6 AATCGCACAT GGACATTTAG ACCTCCCCAC ATGTGAAAGC 960 

TTCTCTGGCG AAAGCAAAAA AAGTATAATA AGGACCCATG CCTTCCCTCT TCCTGGGCCG 1020 

TTTCAACTTT TTCTTTTTCT TTGTCTATCA ACACACACAC ACCTCACGAC CATQACTGCA 1080 

CAGGATATTA TCGCCACATA CATCACCAAA TGGTACGTGA TAGTACCACT CGCTTTGATT 1140 

GCTTATAGGG TCCTCGACTA CTTTTACGGC AGATACTTGA TGTACAAGCT TGGT6CTAAA 1200 

CCGTTTTTCC AGAAACAAAC AGACGGTTAT TTCGGATTCA AAGCTCCACT TGAATTGTTA 1260 

AAAAAGAAGA 6TGACGGTAC CCTCATAGAC TTCACTCTCG AGCGTATCCA AGCGCTCAAT 1320 

CGTCCAGATA TCCCAACTTT TACATTCCCA ATCTTTTCCA TCAACCTTAT CAGCACCCTT 1380 

GAGCCGGAGA ACATCAAGGC TATCTTGGCC ACCCA6TTCA ACGATTTCTC CTT6GGCACC 1440 

AGACACTCGC ACTTTOCTCC TTTGTTGGGC GATGGTATCT TTACCTTGGA COGTOCCGGC 1500 

TGGAAGCACA GCAGATCTAT GTTGAGACCA CAGTTTGCCA GAGAACAGAT TTCCCACGTC 1560 

AA0TTGTTGG AGCCACACAT GCAGGTGTTC TTCAAGCACG TCAGAAAGGC ACAGGGCAAG 1620 

ACTTTTGACA TCGAAGAATT 6TTTTTCAGA TTGACC6TCG ACTCCGCCAC TGAGTTTTTG 1680 

TTTGGTGAAT COGTTGAGTC CTTGAGAGAT GAATCTATTG GGAT6TCCAT CAATGCACTT 1740 

QACTTTGACG 6CAA6GCTGG CTTTGCTGAT GCTTTTAACT ACTCGCA6AA CTATTTG6CT 1800 

TCQAGAGCGG TTATGCAACA ATTGTACTGG GTGTTGAACG GGAAAAAGTT TAAGGAGTGC 1860 

AACGCTAAAG TGCACAAGTT TGCTGACTAT TACGTCAGCA AGGCTTTGGA CTTGACACCT 1920 

GAACAATTGG AAAAGCAGGA TGGTTAT8TG TTCTTGTACG A6TT6GTCAA GCAAACCAGA 1980 

GACAGGCAAG TGTTGAGAOA CCAGTTGTTG AACATCATGG TTGCC6GTAG AGACACCACC 2040 

GCCGGTTTGT TGTCGTTTGT TTTCTTT6AA TTGGCCAGAA ACCCAGA6GT GACCAACAAG 2100 

TTGAGAGAAG AAATCGT^GGA CAAGTTTGGT CTTGGTGAGA ATGCTCGTGT TQAA6ACATT 2160 

TCCTTTGAGT CGTTGAAGTC ATGTGAATAC TTGAAGGCTG TTCTCAACGA AACTTTGAGA 2220 

TTGTACCCAT CCGTGCCACA GAATTTCAGA GTTGCCACCA AAAACACTAC CCTTCCAAGG 2280 

6GAGGTGGTA AGGACGG6TT ATCTCCT6TT TT6GTCAGAA AGG6TCAAAC CGTTATGTAC 2340 

GGTGTCTAC6 CTGCCCACAG AAACCCAGCT GTCTACGGTA AGGAC6CCCT TGAGTTTAGA 2400 

CCAGAGAGGT GGTTTGAGCC AGAGACAAAG AAGCTTGGCT GGGCCTTCCT TCCATTCAAC 2460 

GGTGGTCCAA GAATTTGCTT GGGACAGCAG TTTGCCTTGA CAGAAGCTTC GTATGTCACT 2520 

GTCAGATTGC TCCAAGAGTT TGGACACTTG TCTATGGACC CCAACACCGA ATATCCACCT 2580 

AGGAAAATGT CGCATTTGAC CATGTCCCTT TTCGACG6T6 CCAACATTGA 6ATGTATTAG 2640 

AGGATCATGT GTTATTTTTG ATTGGTTTAG TCTGTTTGTA GCTATTGATT AGGTTAATTC 2700 

ACGGATTGTT ATTTATTGAT AGGGGGTGCG TGTGTGTGTG TGTGTTGCAT TCACATGGGA 2760 

TCGTTCCAGG TrGTTGTTTC CTTCCATCCT GTTGAGTCAA AAGGAGTTTT GTTTTGTAAC 2820 

TCCGGACQAT QTCTTAGATA GAAGGTCGAT CTCCATGTGA TTGTTTGACT GCTACTCTGA 2880 

TTATGTAATC TGTAAA6CCT AGACGTTATG CAAGCATGTG ATTGTGGTTT TTGCAACCTG 2940 

TTTGCACGAC AAATGATCGA CAGTCGATTA CGTAATCCAT ATTATTTAGA G6GGTAATAA 3000 

AAAATAAATG GCAGCCAGAA TTTCAAACAT TTTGCAAACA ATGCAAAAGA TGAGAAACTC 3060 

CAACAGAAAA AATAAAAAAA CTCCGCAGCA CTCCGAACCA ACAAAACAAT 6GGGGGCGCC 3120 

AQAATTATTG ACTATTGTGA CTTTTTTTTA TTTTTTCCGT TAACTTTCAT TGCAGTGAAG 3180 

TGTGTTACAC GGGGTGGTGA TGGTGTTGGT TTCTACAATG CAAGGGCACA GTTGAA66TT 3240 

TCCACATAAC GTTGCACCAT ATCAACTCAA TTTATCCTCA TTCAT6TGAT AAAAGAAGA6 3300 
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CCAAAA6GTA. ATTGGCAGAC CCCCCMkGGG 6AACAC6GAG TAGAAAGCAA TGGAAAC;iC6 3360 

CCCATGACAG TGCCATTTAG CCCACAACAC ATCTAGTATT CTTTTTTTTT TTTGTGCGCA 3420 

G6TGCACACC TG6ACTTTA6 TTATTGCCCC ATAAAGTTAA CAATCTCACC TTTGGCTCTC 3480 

CCAGTGTCTC CGCCTCCAGA TGCTCGTTTT ACACCCTCGA GCTAACGACA ACACAACACC 3540 

CATGAGGGGA ATGGGCAAAG TTAAACACTT TTGGTTTCAA TGATTCCTAT TTGCTACTCT 3600 

CrrGTTTTGT GTTTTGATTT GCACCATCTG AAATAAACGA CAATTATATA TACCTTTTCG 3660 

TCTGTCCTCC AATGTCTCTT TTT6CTGCCA TTTTGCTTTT TGCTTTTTGC TTTTGCACTC 3720 

TCTCCCACTC CCACAATCAG T6C3^6CAACA CACAA 3755 

(2) INFORMATION FOR SBQ ID NO: 88: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3900 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 

GACATCATAA TGACCCGGTT ATTTCGCCCT CAGGTTGCTT ATTTGA6CCG TAAAGTGCAS 60 

TAGAAACTTT GCCTTGGGTT CAAACTCTAG TATAATGGTG ATAACTGGTT GCACTCTIGC 120 

CATAGGCATG AAAATAGGCC GTTATAGTAC TATATTTAAT AAGCGTAGGA GTATAGGATG 180 

CATATGACCG GTTTTTCTAT ATTTTTAAGA TAATCTCTAG TAAATTTTGT ATTCTCAGTA 240 

GGATTTCATC AAATTTCGCA ACCAATTCTG GCGAAAAAAT GATTCTTTTA CGTCAAAAGC 300 

TGAATA6T6C AGTTTAAAGC ACCTAAAATC ACATATACAG CCTCTAGATA CGACAGAGAA 360 

6CTCTTTATG ATCTGAAGAA GCATTAGAAT A6CTACTAT6 A6CCACTATT GGTGTATATA 420 

TTAGGGATTG GTGCAATTAA GTACGTACTA ATAAACAGAA GAAAATACTT AACCAATTTC 480 

TGGTGTATAC TTAGTGGTGA GGGACCTTTT CTGAACATTC GG6TCAAACT TTTTTTTGGA 540 

GTGCGACATC GATTTTTCGT TTGTGTAATA ATAGTGAACC TTTGTGTAAT AAATCTTCAT 600 

GCAAGACTTG CATAATTCGA GCTTGGGA6T TCACGCCAAT TTGACCTCGT TCATGTGATA 660 

AAAGAAAAGC CAAAAGGTAA TTAGCAGACG CAATGGGAAC ATGGAGTGGA AAGCAATGGA 720 

AOCACGCCCA GGACGGA0TA ATTTAGTCCA CACTACATCT GGGGGTTTTT TTTTTGTGCO 780 

CAAGTACACA CCTG6ACTTT AGTTTTTGCC CCATAAA6TT AACAATCTAA CCTTTGGCTC 840 

TCCAACTCTC TCCGCCCCCA AATATTCGTT TTTACACCCT CAAGCTAGCG ACAGCACAAC 900 

ACCCATTAGA GGAATGGGGC AAAGTTAAAC ACTTTTGGCT TCAATGATTC CTATTCGCTA 960 

CTACATTCTT CTCTTGTTTT GT6CTTTGAA TTGCACCATG TGAAATAAAC GACAATTATA 1020 

TATACCTTTT CATCCCTCCT CCTATATCTC TTTTTGCTAC ATTTTGTTTT TTACGTTTCT 1080 

TGCTTTTGCA CTCTCCCACT CCCACAAAGA AAAAAAAACT ACACTATGTC GTCTTCTCCA 1140 

TCGTTTGCCC AAQAGGTTCT CGCTACCACT AGTCCTTACA TCGAGTACTT TCTTGACAAC 1200 

TACACCAGAT GGTACTACTT CATACCTTTG GTGCTTCTTT CGTTGAACTT TATAAGTTTG 1260 

CTCCACACAA GGXACTTGGA ACGCAGGTTC CACGCCAAGC CACTC66TAA CTTTGTCAGG 1320 

GACCCTACGT TT6GTATCGC TACTCCGTTG CTTTTGATCT ACTTGAAGTC GAAAGGTACG 1380 

GTCATGAAGT TTGCTT6GGG CCTCTGGAAC AACAAGTACA TCGTCAGAGA CCCAAA6TAC 1440 

AAGACAACTG GGCTCAGGAT TGTTGGCCTC CCATTGATTG AAACCATGGA CCCAGAGAAC 1500 

ATCAAGGCTG TTTTGGCTAC TCAGTTCAAT GATTTCTCTT TGGGAACCAG ACACGATTTC 1560 

TTGTACTCCT TGTTGGGTGA CGGTATTTTC ACCTTGGACG GTGCTGGCTG GAAACATAGT 1620 

AGAACTATGT TGAGACCACA GTTTGCTAGA GAACAGGTTT CTCACGTCAA 6TTGTTGGAG 1680 

CCACAGGTTC A6GTGTTCTT CAAGCACGTT AGAAA6CACC 6CG6TCAAAC 6TTC6ACATC 1740 

CAAGAATTGT TCTTCAGGTT GACCGTCGAC TCCGCCACCG AGTTCTTGTT TGGTGAGTCT 1800 

GCTGAATCCT TGAGGGACGA ATCTATTGGA TTGACCCCAA CCACCAAGGA TTTCGATGGC 1860 

AGAAGAGATT TCGCTGACGC TTTCAACTAT TCGCAGACTT ACCAGGCCTA CAGATTTTTG 1920 

TTGCAACAAA T6TACTGGAT CTTGAATGGC TCGGAATTCA GAAAGTCGAT TGCTGTCGTG 1980 

CACAAGTTTG CTGACCACTA TGTGCAAAAG GCTTTGGAGT TGACCGACGA TGACTTGCAG 2040 

AAACAAGACG GCTATGTGTT CTTGTACGAG TTGGCTAAGC AAACCAGAGA CCCAAA6GTC 2100 

TTGAGAGACC AGTTATTGAA CATTTTGGTT GCCGGTAGAG ACACGACCGC CGGTTTGTTG 2160 

TCATTTGTTT TCTACGAGTT GTCAAGAAAC CCTGAGGTGT TTGCTAAGTT GAGAGAGGAQ 2220 

GTGGAAAACA GATTTGGACT CGGTGAAGAA GCTCGTGTTG AAGAGATCTC GTTTGAGTCC 2280 

TTGAAGTCTT QTGAGTACTT GAAGGCTGTC ATCAATGAAA CCTTGAGATT 6TACCCATCG 2340 
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GTTCCACACA ACTTTAGAGT TGCTACCAGA AACACTACCC TCCCAAGAGG TGGTGGTGAA 2400 

GATGGATACT C6CCAATT6T CGTCAAGAAG GGTCAAGTTG TCATGTACAC TGTTATTGCT 2460 

ACCCACAGAG ACCCAA5TAT CTACGGTCCC GACGCT6A0G TCTTCAGACC AGAAAGATGG 2520 

TTTGAACCA6 AAACTAGAAA 6TTG6GCTG6 GCATACGTTC CATTCAATGG TGGTCCAAGA 2580 

ATCTGTTTGG GTCAACAGTT TGCCTTGACC GAAGCTTCAT ACGTCACTGT CAGATTGCTC 2640 

CAGGAGTTTG CACACTTGTC TATGGACCCA GACACCGAAT ATCCACCAAA ATTGC7U3AAC 2700 

ACCTTGACCT TO iX XS Cr C lT TGATGGTGCT GAXGTTAGAA TGTACTAAGG TTGCTTTTCC 2760 

TTGCTAATTT TCTTCTGTAT AGCTT6T6TA TTTAAATTGA ATCGGCAATT GATTTTTCTG 2820 

ATACCAATAA CCOTAGTGCG ATTTGACCAA AACCGTTCAA ACTTrTTGTT CTCTCGTTGA 2880 

CGTGCTCGCT CATCAGCACT GTTTGAAGAC GAAAQAGAAA ATTTTTTGTA AACAACACTG 2940 

TCCAAATTTA CCCAACGTGA ACCATTATGC AAATGAGCGG CCCTTTCAAC TGCTCGCTGG 3000 

AAGCATTCGG GGATATCTAC AACXXrCCTTA ACTTTGAAAC AC3ACATTGAT TTAGACACCA 3060 

TAGATTTCAG CGGCATCAA6 AAT6ACCTTG CCXACATTTT GACGACCCCA ACACCACTGG 3120 

AAGAATCACG CCAGAAACTA GGCXSATGGAT CCAA6CCTGT GACCTTGCX^C AATG6AGAG6 3180 

AA6TGGA6TT GAACCAAGCG TTCCTAGAAG TTACCACATT ATTGTCGAAT GAGTTTGACT 3240 

TGGACCAATT GAACGCGGCA GAGTTGTTAT ACTACXOTGG CGACATATCC TACAAGAAGG 3300 

GCACATCAAT C6CA6ACAGT GCC^GATTGT CTTATTATTT GAGAGCAAAC TACATCTTGA 3360 

ACATACTT6G GTATTTGATT TCGAAGCAGC GATTGGATTT GATA6TCACG GACAACGACG 3420 

CGTTGTTTGA TAGTATTTTG AAAAGTTTTG AAAAGATCTA CAAOTTGATA AGCGTGTTGA 3480 

ACGATATGAT TGACAAGCAA AAGGTGACAA GCGACATCAA CAGTCTAGCA TTCATCAATT 3540 

GCATCAACTA CTCGAGAGGT CAACTATTCT CCGCACACGA ACTTTTGGGA CTGGTTTTGT 3600 

TTGGATTGGT CX3ACATCTAT TTCAACCAGT TTGGCACATT AGACAACTAC AAGAAGGTAT 3660 

TGQCATTGAT ACTGAAGAAC ATCAGCGATG AAGACATCTT GATCATACAC TTCCTCCCAT 3720 

CGACACTACA ATTGTTTAAG CTGGTGTTGG ACAAGAAAGA CGACGCTGCA 6TTGAACAGT 3780 

TCTACAAGTA CATCACTTCA ACAGTGTCAC GAGACTACAA CTCCAACATC GGCTCCACAG 3840 

CCAAAGATGA TATCGATTTG TCCAAAACCA AACTCAGTGG CTTTGAGGTG TTGACGAGTT 3900 

(2) IHFORMATION FOR SEQ ID NO: 89: 
<1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3668 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: UNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 

CCTGCAGAAT TCGCGGCCGC GTCGACAGAG TAGCAGTTAT GCAA6CAT6T GATTGTG6TT 60 

TTTGCAACCT GTTTGCACGA CAAATGATCG ACAGTCGATT ACGTAATCCA TATTATTTAG 120 

AGGGGTAATA AAAAATAAAT G6CAGCCAGA ATTTCAAACA TTTTGCAAAC AATGCAAAAG 180 

ATGAGAAACT CCAACAGAAA AAATAAAAAA ACTCCGCAGC ACTCCGAACC AACAAAACAA 240 

TG6GGGGCGC CAGAATTATT GACTATTGTG ACTTTTTTTT ATTTTTTCCG TTAACTTTCA 300 

TT6CAGTGAA GTGTGTTACA CGGGGTGGTG ATG6TGTTG6 TTTCTACAAT GCAAGG6CAC 360 

AGTTGAAGGT TTCCACATAA CGTTGCACCA TATCAACTCA ATTTATCCTC ATTCATGTGA 420 

TAAAAGAAGA GCCAAAAG6T AATTGGCAGA CCCCCCAAGG GGAACACGGA GTAGAAAGCA 480 

ATGGAAACAC GCCCATGACA GTGCCATTTA GCCCACAACA CATCTAGTAT TCTTTTTTTT 540 

TTTTGTGCGC AGGTGCACAC CTGGACTTTA GTT A TT G CCC CATAAA6TTA ACAATCTCAC 600 

CTTTGGCTCT CCCAGTGTCT CCGCCTCCAG ATGCTCGTTT TACACCCTCG A6CTAACGAC 660 

AACACAACAC CCATGAGGGG AATGGGCAAA GTTAAACACT TTTGGTTTCA ATGATTCCTA 720 

TTTGCTACTC TCTTGTTTTG TGTTTTGATT TGCACCATGT GAAATAAACG ACAATTATAT 780 

ATACCTTTTC GTCTGTCCTC CAATGTCTCT TTTTGCTGCC ATTTTGCTTT TTGCTTTTTG 840 

CTTTTGCACT CTCTCCCACT CCCACAATCA 6TGCAGCAAC ACACAAAGAA GAAAAATAAA 900 

AAAACCTACA CTATGTCGTC TTCTCCATCQ TTTGCTCAGG AGGTTCTCGC TACCACTAGT 960 

CCTTACATCG AGTACTTTCT TGACAACTAC ACCAGATGGT ACTACTTCAT CCCTTTGGTG 1020 

CTTCTTTCGT TGAACTTCAT CAGCTTGCTC CACACAAAGT ACTTGGAACG CAGGTTCCAC 1080 

GCCAAGCCGC TCGGTAACGT CGTGTTGGAT CCTACGTTTG GTATCGCTAC TCC6TTGATC 1140 

TTQATCTACT TAAAGTCGAA A6GTACAGTC ATGAAGTTTG CCTGGAGCTT CTGGAACAAC 1200 

AAGTACATT6 TCAAAGACCC AAAGTACAAG ACCACTGGCC TTAGAATTGT CGGCCTCCCA 1260 



-30 



wo 00/20566 



PCT/US99/20797 



TTGATTCAAA CCATAGACCC AGAGAACATC AAAGCTOTGT TGGCTACTCA GTTCAACGAT 1320 

TTCTCCTTGQ GAA^CTASACA CGATTTCTTG TACTCCTTGT T6GGCGATGG TATTTTXIU^C 1380 

TTGGAC6GTG CTGGCTGGAA ACACAGTAQA ACTATGTTGA GACCACAGTT TGCTAGAGAA 1440 

CAG6TTTCCC ACGTCAAGTT GTT6GAACCA CACGTTCACjO TGTTCTTCAA GCACGTTAfiA 1500 

AAACACCGCG GTCAGACTTT TGACATCCAA GAATTGTTCT TCAGATTGAC CGTCGACTCC 1560 

GCCACCXSAGX TCTTGTTTG6 TGAGTCTGCT GAATCCTTGA GAGACGACTC TGTT6GTTTG 1620 

ACCCCAACCA CCAAGQATTT CX^AAGGCAGA GGAGATTTCG CTGACXICTTT CAACTACTCG 1680 

OUSACTTACC AGGCCTACAG ATTTTTGTT6 CAACAAAIGT ACTG6ATTTT GAATGG06C6 1740 

GAATTCAGAA AGTCGATT6C CATCGTGCAC AAGTTTGCTG ACCACTATGT GCAAAAG6CT 1800 

TTGGAGTTGA CCGACGATGA CTTGCAGAAA CAAGACGGCT ATGTGTTCTT GTACGAGTTG 1860 

GCTAAGCAAA CTAGAGACCC AAAGGTCTTG AGAGACCAGT TGTTGAACAT TTTGGTT6CC 1920 

GGTAGA6ACA CGACCGCCGG TTT6TTGTC0 TTTGTGTTCT ACGAGTTGTC GAGAAACCCT 1980 

GAAGTGTTTG CCAAGTTGA6 AGAGGAG6TG GAAAACA6AT TT6GACTCG6 CGAAGA6GCT 2040 

CGT6TTGAA0 AGATCTCTTT TGA6TCCTT6 AAGTCCT6TG AGTACTTGAA GGCT6TCATC 2100 

AATQAAGCCT TGAGATTGTA CCCATCTGTT CCACACAACT TCAGAGTTGC CACX:AGAAAC 2160 

ACTACCCTTC CAAGAG6CGG TGGTAAAGAC GGATGCTCGC CAATTGrTTGT CAAGAAGGGT 2220 

CAAGTTGTCA TGTACACTGT CATTGGTACC CACAGAGACC CAAGTATCTA CGGTGCCX3AC 2280 

GCCGACGTCT TCAGACCAGA AA6ATGGTTC 6AGCCAGAAA CTAGAAAGTT GGGCTGGGCA 2340 

TATGTTCCAT TCAATG6TG6 TCCAAGAATC TGTTTGGGTC A6CAGTTTGC CTTGACTGAA 2400 

GCTTCATACG TCACTGTCAG ATTGCTCCAA GAGTTTGGAA ACTTCTCCCT GGATCCAAAC 2460 

GCTGAGTACC CACCAAAATT GCAGAACACC TTGACCTTGT CACTCTTTGA TGGTGCTGAC 2520 

GTTAGAATGT TCTAAGGTT6 CTTATCCTTG CTAGTGTTAT TTAT7U5TTTG TGTATTTAAA 2580 

TTGAATCGGC GATTGATTTT TCTGGTACTA ATAACTGTAG TGGGTTrTGA CCAAAACCGT 2640 

TCAAACTTTT ' rrrrrm " ! " ! ' TCTTCCCTCCT ACCTTCGTTG CTCGCTCATC AGCACTCTTT 2700 

GAAAACGAAA AAAGAAAATT TTTTGTAAAC AACATTGCCC AAACTTACCC AACGTGAACC 2760 

ATTATAACCA AATGAGCGGC GCTTTCAACT GGTCACT6GA GGCATTCGGG GATATCTACA 2820 

ACACCCTTAA GTTTGAGGAA GACATTGATT TAGACACCAT AGATTTCAGC GGCATCAAQA 2880 

ATGACCTTGT CCACATTTTG ACAACCCCAA CACCACTGGA AGAATCX3CGC CAGAAACTAG 2940 

GCGATGGATC CAAGCCTGTG GCCTTGCCCA ATGGAGACGA AGTGGAGTTG AACCAAGCGT 3000 

TCCTAGAAGT TACCACATTA TTGTCGAACG AGTTTGACTT GGACCAATTG AACGCGGCCX5 3060 

AGTTGTTATA CTACGCCGGC GACATATCCT ACAAGAAGGG CACATCAATT GCCGACAGTG 3120 

CCAGATTGTC TTACTATTTG AGA6CAAACT ACATCTTGAA CATACTTGGG TACTTTATTT 3180 

CGAAGCAGCG ATTGGATGTG ATAGTCACC6 ACAACAACGC GTTGTTTGAT AATATTTTGA 3240 

AAAGTTTTGA AAAGATCTAC AAGTTGATAA GCGCGTTGAA CGATATGATT GACAAGCAAA 3300 

AGGTGACAA6 CGACATCAAC AGTCTAGCAT TTATCAACTG CATCAACTAC TCGAGGGGTC 3360 

AACTATTCTC CGCACACGAA CTTTTGGGAC TGGTTTTGTT TGGATTGGTT GACAACTATT 3420 

TCAACCAGTT TGGCTCATTA GACAACTACA AGAAAGTATT GGCATTGATA CTGAAGAACA 3480 

TCAGTGATGA AGATATCTTG ATCGTACGCT TCCTCCCATC GACACTACAA TTGTTTAAGC 3540 

TG6TGTTGGA TAAGAAAGAC GACGCCACTG TTGACCAGTT CTACAA6TAC ATCACCTCAA 3600 

CAGTGTCGCA AGACTACAAC TCCAACATCG GAGCCACAGC CAAAGATGAT ATCGATTTGT 3660 

CCAAAGCC 3668 



(2) INFORMATION FOR SEQ ID NO: 90: 
(1) SEQDENCS CHARACTERISTICS: 

(A) LENGTH: 3826 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDKESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECDLE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 
TGGAGTCGCC AGACTTGCTC ACTTTTGACT CCCTTCGAAA CTCAAAGTAC GTTCAGGCGG 60 
TGCTCAACGA AACGCTCCGT ATCTACCCGG GGGTACCACG AAACATGAAG ACAGCTACGT 120 
GCAACACGAC GTTGCCACGC GGAGGAGGCA AAGACGGCAA GGAACCTATC TTGGTGCAGA 180 
AGGGACAGTC CGTTGGGTTG ATTACTATTG CCACGCAGAC GGACCCAGAG TATTTTGGGG 240 
CCGACGCTGG TGAGTTTAAG CCGGAGAGAT GGTTTGATTC AAGCATGAAG AACTTGGGGT 300 
GTAAATACTT GCCGTTCAAT GCTGGGCCAC GGACTTGCTT GGGGCAGCAG TACACTTTGA 360 
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TTGAAGCGAG CTACTTGCTA GTCCX3GTTGG CCCAGACCTA CCGGGCAATA GATTTGCAGC 420 

CAGGATC6GC GTACCCACCA AGAAA6AA6T CGTTGATCAA CATGAGTGCT GCCGACG6GG 480 

TGTTTGTAAA GCTTTATAAG GAT6TAACG6 TAGATGGATA GTTGTGTAGG AGGA6CG6A6 540 

ATAAATTAGA TTTGATTTTG TGTAAGGTTT TGGAT6TCAA CCTACTCCGC ACTTCAT6CA 600 

GTGTGTGTGA CACAA6GGTG TTICTACGTGT 6C6T6TGCGC CAAGAGACA8 CCCAA66G6G 660 

TG6TAGTGTG TGTTGGCGGA AGTGCAT6TG ACACAACGCG T6GGTTCT6G CXIAATGGTGG 720 

ACTAAGT6CA 6GTAAGCAQC GACCTGAAAC ATTCCTCAAC GCTTAAGACA CT6GT6GTAG 780 

AGATGOGGAC CAGGCTATTC TTCSTOGTGCT ACCC6G06CA TGGAAAATCA ACT6C6GGAA 840 

GAATAAATTT ATCCGTAGAA TCCACAGAGC GGAIAAATTT GCCCACCTCC ATCATCAACC 900 

ACGCCGCCAC TAACTACATC ACTCCCCTAT TTTCTCTCTC TCTCTTTGTC TTACTCCGCT 960 

CCC6TTTCCT TAGCCACAGA TACACACCCA CTGCAAACAQ CAGCAACAAT TATAAAGATA 1020 

CGCCAGGCCC ACCTTCTTTC TTTTTCTTCA CTTTTTTGAC TGCAACTTTC TACAATCCAC 1080 

CACAGCCACC ACCACAGCCG CTATGATTGA ACAACTCCTA GAATATTG6T ATGTCGTTGT 1140 

6CCA6TGTTG TACATCATCA AACAACTCCT T6CATACACA AA6ACTCGCG TCTTGATGAA 1200 

AAAGTTGGGT GCTGCTCCAG TCACAAACAA GTTGTACGAC AACGCTTTCG GTATCGTCAA 1260 

TGGATGGAAG 6CTCTCCAGT TCAAGAAAGA GGGCAGGGCT CAAGAGTACA ACGATTACAA 1320 

GTTT6ACCAC TCCAAGAACC CAAGC6TGGG CACCTACXSTC AGTATTCTTT TCGGCACCAG 1380 

QATC6TCX3TG ACCAAA6ATC CAGAGAATAT CAAAGCTATT TTGGCAACCC AGTTTGGTGA 1440 

' rrrri ' crri ' G ggcaagaggc acactctttt taagcctttg ttaggtgatg gqatcttcac isoo 

ATTGGACGGC GAAGGCTGGA AGCACAGCAG AGCCATGTTG AGACCACAGT TTGCCAGAGA 1560 

ACAAGTTGCT CATGTGACGT CGTTGGAACC ACACTTCCAG TTGTTGAAGA AGCATATTCT 1620 

TAAGCACAAG GCTGAATACT TTGATATCCA GGAATTCTTC TTTAGATTTA CCGTTGATTC 1680 

6GCCACGGAG TTCTTATTTG GTGAGTCCGT GCACTCCTTA AAGGACGAAT CTATTGGTAT 1740 

CAACCAAGAC GATATAGATT TTGCTGGTAG AAAGGACTTT GCTGAGTCGT TCAACAAAGC 1800 

CCAGGAATAC TTGGCTATTA GAACCTTGGT GCAGACGTTC TACTGGTTGG TCAACAACAA 1860 

GGAGTTTAGA GACTGTACCA AGCTGGTGCA CAAGTTCACC AACT21CTATG TTCAGAAAGC 1920 

TTT6GATGCT AGCCCAGAA6 AGCTTGAAAA GCAAAGTGGO TATGT6TTCT TGTACGAGCT 1980 

T6TCAAGGAG ACAAGAGACC CCAAT6TGTT GCGTGACCAG TCTTTGAACA TCTTGTTGGC 2040 

C6GAAGAGAC ACCACTGCTG GGraGTTGTC GTTTGCTGTC TTTGAGTTGG CCAGACACCC 2100 

AGAGATCT6G GCCAAGTTGA GAGAGGAAAT TGAACAACAG TTTGGTCTTG GAGAAGACTC 2160 

TCGTGTTGAA GAGATTACCT TTGAGAGCTT GAAGAGATGT GAGTACTTGA AAGCGTTCCT 2220 

TAATGAAACC TTGCX5TATTT ACCCAAGTGT CCCAAQAAAC TTCAGAATCG CCACCAAGAA 2280 

CACGACATTG CCAAGGGGCG GTGGTTCAGA CGGTACCTCG CCAATCTTGA TCCAAAAGGG 2340 

AGAAGCTGTG TCGTATGGTA TCAACTCTAC TCATTTGGAC CCTGTCTATT ACG6CCCTGA 2400 

TGCTGCTGAG TTCAGACCAG AGAGATGGTT TGAGCCATCA ACCAAAAAGC TCGGCTGGGC 2460 

TTACTTGCCA TTCAACGGTG GTCCAAGAAT CTGTTTGGGT CAGCAGTTTG CCTTGACGGA 2520 

AGCTGGCTAT GTGTTGGTTA GATTGGTGCA AGAOTTCTCC CACGTTAGGC TGGACCC3VGA 2580 

CGAGGTGTAC CCGCCAAAGA GGTTGACCAA CTTGACCATG TGTTTGCAGG AT6GTGCTAT 2640 

TGTCAAGTTT GACTAG06GC 6TGGTG2UITG CGTITGATTT TGTAGTTTCT 6TTTGCAGTA 2700 

ATGAGATAAC TATTCAGATA AGGCGAGTGG ATGTACGTTT TGTAAGAGTT TCCTTACAAC 2760 

CTTGGTGGGG TGTGTGAGGT TGA6GTT6CA TCTTGGGGAG ATTACACCTT TTGCAGCTCT 2820 

CCGTATACAC TTOTACTCTT TGTAACCTCT ATCAATCATG TGGGGGGGGG GGTTCATTGT 2880 

TTGGCCATGG TGGT6CATGT TAAATCCGCC AACTACCCAA TCTCACATGA AACTCAAGCA 2940 

CACTAAAAAA AAAAAAGATG TTG6GGGAAA ACTTTGGTTT CCCTTCTTAG TAATTAAACA 3000 

CTCTCACTCT CACTCTCACT CTCTCCACTC AGACAAACCA ACCACCTGGG CTGGAGACAA 3060 

CCAQAAAAAA AAAGAACAAA ATCCAGATAG AAAAACAAAG GGCTGGACAA CX!ATAAATAA 3120 

ACAATCTAGG GTCTACTCCA TCTTCCACTG TTTCTTCTTC TTCAGACTTA GCTAACAAAC 3180 

AACTCACTTC ACCATGGATT ACGCAGGCAT CACGCGTGGC TCCATCAGAG GCGA6GCCTT 3240 

GAAGAAACTC GCAGAATTGA CCATCCAGAA CGAGCCATCC AGCTTGAAAG AAATCAACAC 3300 

CGGCATCXAG AAGGACGACT TTGCCAAGTT GTTGTCTGCC ACCCCGAAAA TCCCCACCAA 3360 

GCACAAGTTG AACGGCAACC ACGAATTGTC TGAGGTC6CC ATTGCCAAAA AGGAGTACGA 3420 

GGTGTTGATT GCCTTGAGCG ACGCCACAAA AGACCCAATC AAAGTGACCT CCCAGATCAA 3480 

GATCTTGATT GACAAGTTCA AGGT6TACTT GTTTGAGTTG CCT6ACCAGA AGTTCTCCTA 3540 

CTCCATCGTG TCCAACTCCG TCAACATCGC CCCCTGGACC TTGCTCGGGG AGAAGTTGAC 3600 

CACGGGCTTG ATCAACTTGG CCTTCCAGAA CAACAAGCAG CACTTGQACG AGGTCATTGA 3660 

CATCTTCAAC GAGTTCATCG ACAAGTTCTT TGGCAACACG GAGCCX3CAAT TGACCAACTT 3720 
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CTTGACCTTG TGCX3GTGTGT TGGACGGCTT GATPGACCAT GCCAACTTCT TGAGCCTGTC 3780 

CTC6C6GACC TTCSUUSATCT TCTTGAACTT G6ACTC6TAT 6TGGAC 3826 

(2) XNFOIOIATION FOR SEQ ID N0:91: 
(i) SEQUENCE CHAHACTERISTICS : 

(A) LENGTH: 3910 base pairs 

(B) nPB: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 

TTACAATCAT 6GAGCTCGCT AGGAACCCAG ATGTCT6GGA GAA6CTCCGC GAAGAGGTCA 60 

ACACGAACTT TGGCATGGA6 TCGCCAGACT TGCTCACTTT TGACTCTCTT AGAAGCTCAA 120 

AGTAOGTTCA GGCGGTGCTC AACGAAAC6C TTCGTATCTA CCCGG6GGTG CCAG6AAACA 180 

TGAA6ACAGC TACGT6CAAC ACGACGTTGC CGCGTGGAGG AGGCAAAGAC GGTAAGGAAC 240 

CTATTTTGGT GCAGAAGGGC CAGTCCGTTG GGTTGATTAC TATTGCCACG CAGACGGACC 300 

CAGAGTATTT TGGGGCAGAT GCTGQTGAGT TCAAACCGGA GAGATGGTTT GATTCAAGCA 360 

TGAAGAACTT GGGGT6TAAG TACTTGCCGT TCAATGCTGG GCCCCGGACT TGTTTGGGGC 420 

AGCAGTACAC TTT6ATTGAA GCGAGCTATT TGCTAGTCA6 GTTGGCGCAG ACCTACCG66 480 

TAATCGATTT GCTGCCAGGG TCGGCGTACC CACCAAGAAA GAA6TCGTTG ATCAATATGA 540 

GTGCTGCCGA TGGGGTGGTT GTAAAGTTTC ACAA6GATCT AGATGGATAT GTAAGGT6TG 600 

TAGGAGGAGC GGAGATAAAT TAGATTTGAT TTTGTGTAAG GTTTAGCACG TCAAGCTACT 660 

CCGCACTTTG TGTGTAGGGA GCACATACTC CGTCTGCGCC TGTGCCAAGA GACGGCCCAG 720 

GG6TA6TGTG TGGTGGTGGA AGTGCATGTG ACACAATACC CTG6TTCTG6 CCAATTGGGG 780 

ATTTAGTGTA GGTAAGCT6C GACCTGAAAC ACTCCTCAAC GCTTGAGACA CTGGTGGGTA 840 

GAGATGCGGG CCAGGAGGCT ATTCTTGTCG TGCTACCCGT GCACGGAAAA TCGATTGAGG 900 

GAAGAACAAA TTTATCCGTG AAATCCACAG AGCGGATAAA TTTGTCACAT TGCTGCGTTG 960 

• a 

CCCACCCACA GCATTCTCTT TTCTCTCTCT TTGTCTTACT CCGCTCCTGT TTCCTTATCC 1020 

AGAAATACAC ACCAACTCAT ATAAAGATAC 6CTAGCCCAG C l X iT C TTTCT TTTTCTTCAC 1080 

TTTTTTTGGT GTGTTGCTTT TTTGGCTGCT ACTTTCTACA ACCACCACCA CCACCACCAC 1140 

CATGATTGAA CAAATCCTAG AATATTGGTA TATTGTTGTG CCTGTGTTGT ACATCATCAA 1200 

ACAACTCATT GCCTACAGCA AGACTCGCGT CTTGATGAAA CAGTTGGGTG CTGCTCCAAT 1260 

CACAAACCAG TTGTACGACA ACGTTTTCGG TATCGTCAAC GGATGGAAGG CTCTCCAGTT 1320 

CAA6AAAGAG GGCAGAGCTC AAGAGTACAA CGATCACAAG TTTGACAGCT CCAAGAACCC 1380 

AAGCGTCGGC ACCTATGTCA GTATTCTTTT TGGCACCAAG ATTGTCGTGA CCAAGGATCC 1440 

AGAGAATATC AAAGCTATTT TGGCAACCCA GTTTGGCGAT TTTTCTTTGG GCAAGAGACA 1500 

CGCTCTTTTT AAACCTTTGT TAGGTGATGG GATCTTCACC TTGGACGGCG AAGGCTGGAA 1560 

GCATAGCA6A TCCATGTTAA 6ACCACAGTT TGCCAGAGAA CAAGTTGCTC ATGTGACGTC 1620 

GTTGGAACCA CACTTCCA6T TGTTGAAGAA GCATATCCTT AAACACAAGG 6TGAGTACTT 1680 

T6ATATCCA0 GAATTGTTCT TTASATTTAC TGTOQACTOG GCCACGGAGT TCTTATTTGG 1740 

TGAGTCCGTG CACTCCTTAA A6GACGAAAC TATCGGTATC AACCAAGACG ATATAGATTT 1800 

TGCTGGTAGA AAGGACTTTG CTGAGTCGTT CAACAAAGCC CAGGAGTATT TGTCTATTAG 1860 

AATTTTGGTG CAGACCTTCT ACTGGTTGAT CAACAACAAG GAGTTTAGAG ACTGTACCAA 1920 

GCTGGTGCAC AAGTTTACCA ACTACTAT6T TCAGAAAGCT TTGGATGCTA CCCCAGAGQA 1980 

ACTT6AAAAG CAAG6C6GGT ATGTGTTCTT GTATGAGCTT GTCAAGCAGA CGAGAGACCC 2040 

CAAGGTGTT6 CGTGACCAGT CTTTGAACAT CTTGTTGGCA GGAAGAGACA CCACTGCTGG 2100 

GTTGTTGTCC TTTGCTGTGT TTGAGTT6GC CAGAAACCCA CACATCTGGG CCAAGTTGAG 2160 

AGAGGAAATT GAACAGCAGT TT6GTCTTGG AGAAGACTCT CGT6TTGAAG AGATTACCTT 2220 

TGAGAGCTTG AAGAGATGTG AGTACTTGAA AGCGTTCCTT AACGAAACCT T606TGTTTA 2280 

CCCAAGTGTC CCAAGAAACT TCAGAATCGC CACCAAGAAT ACAACATTGC CAAGGGGTGG 2340 

TGGTCCA6AC GGTACCCAGC CAATCTTGAT CCAAAAGGGA GAAGGTGTGT CGTATGGTAT 2400 

CAACTCTACC CACTTAGATC CTGTCTATTA TGGCCCTGAT GCTGCTGAGT TCAGACCAGA 2460 

GAGATGGTTT GAGCCATCAA CCAGAAAGCT CGGCTGGGCT TACTTGCCAT TCAACGGTGG 2520 

GCCACGAATC TGTTTGGGTC AGCAGTTTGC CTTGACCGAA GCTGGTTACG TTTTGGTCAG 2580 

ATT6GTGCAA GAGTTCTCCC ACATTAGGCT GGACCCAGAT GAAGTGTATC CACCAAAGAG 2640 

6TT6ACCAAC TTGACCATGT GTTTGCAGGA TGGTGCTATT 6TCAAGTTTG ACTAGXACGT 2700 
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ATGAGTGCGT TTGATTTTCT AGTTTCTGTT TGC^GTAATG AGATAACTAT TCAGATAAGG 2760 

CXaGGTGGATG TACGTTTTCT AAGAGTTTCC TTACAACCCT GGTGGGTGTG TGAGGTTGCA 2820 

TCTTAfiGGAG AGAXAGCACC TTTTGCA6CT CTCCGIATAC A6TTTTACTC TTTGTAACCT 2880 

AT6CCAATCA TGTG6GGATT CATTGTTTGC CCATGGTG6T GCAT6CAAAA TCCCCCCAAC 2940 

TACCXAATCT CACATGAAAC TCAAGCACAC TAGAAAAAAA AGATGTTGCG TGGGTTCTTT 3000 

TGATGTTGGG GAAAACTTTC GTTTCCTTTC TCAGTAATTA AACGTTCTCA CTCAGACAAA 3060 

CCACCTGGGC TGCAGACAAC CAGAAAAAAC AAAATCCAGA TAGAAGAAGA AAGGGCT6GA 3120 

CAACCATAAA TAAACAACCT A66GTCCACT CCATCTTTCA CTTCTTCTTC TTCAGACTTA 3180 

TCTAACAAAC GACTCACTTC ACCATGGATT ACGGAGGTAT CACGCX3TGGG TCCATGA6AG 3240 

6CGAAGCCTT GAA6AAACTC GCCGAGTTGA CCATCCAGAA CCAGCCATCC AGCTTGAAAG 3300 

AAATCAACAC CGGCATCCAG AAGGACGACT TTGCCAA6TT GTTGTCTTCC ACCXTCGAAAA 3360 

TCCACACCAA GCACAA6TTG AATGGCAACC ACGAATTGTC CXSAAGTCGCC ATTGCCAAAA 3420 

AGGAGTACGA GGTGTTGATT GCCTTGAGCG ACGCCAC6AA AGAACCAATC AAA6TCACCT 3480 

CCCAOATCAA GATCTTGATT GACAAGTTCA A66TGTACTT GTTrGAGTTG CCCX3ACCAGA 3540 

AGTTCTCCTA CTCCATCGTG TCCAACTCCG TTAACATTGC CCCCTGGACC TTGCTCGGTG 3600 

AGAAGTTGAC CACGGGCTTG ATCAACTTGG CGTTCCAGAA CAACAAGCAG CACTTGGACG 3660 

AAGTCATCGA CATCTTCAAC GAGTTCATCG ACAAGTTCTT TGGCAACACA GA6CCGCAAT 3720 

TGACCAACTT CTTGACCTTG TCCGGTGTGT TGGACGGGTT GATTGACCAT GCCAACTTCT 3780 

TGAGC6TGTC CTCCAGGACC TTCAAGATCT TCTTGAACTT GGACTCGTTT GTGGACAACT 3840 

CGGACTTCTT GAACGACGTG GAGAACTACT CCGACTTTTT GTACGACGAG CCGAACGAGT 3900 

ACCAGAACTT 3910 

(2) INFORMATION FOR SBQ ID NO: 92: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3150 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS : single 

(D) TOPOLOGY: linear 

(ii) HOIiBCOLE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 

QAATTCTTTG GATCTAATTC CA6CTGATCT TGCTAATCCT TATCAACGTA GTT6TGATCA 60 

TTQTTTGTCT GAATTATACA CACCAGTGGA AGAATATGGT CTAATTTGCA CGTCCCACTG 120 

GCATTGTGTG TTTGTGGGGG GGGGGGGGTG CACACATTTT TAGTGCCATT CTTTGTTQAT 180 

TACCCCTCCC CCCTATCATT CATTCCCACA GGATTAGTTT TTTCCTCACT GGAATTCGCT 240 

GTCCACCTGT CAACCCCCCC CCCCCCCCCC CCCACTGCCC TACCCTGCCC TGCCCTGCAC 300 

GTCCTGT6TT TTGT6CTGTG TCnTCCCAC GCTATAAAAG CCCTG6CGTC C6GCCAAGGT 360 

TTTTCCACCC AGCCAAAAAA ACAGTCTAAA AAATTTGGTT GATCCTTTTT GGTTGCAAGG 420 

TTTTCCACCA CCACTTCCAC CACCTCAACT ATTCGAACAA AAGATGCTCG ATCAGATCTT 480 

ACATTACTGG TACATTGTCT TGCCATTGTT GGCCATTATC AACCAGATCG TGGCTCATGT 540 

CAG6ACCAAT TATTTGATGA A6AAATTGGG TGCTAAGCCA TTCACACACG TCCAACGTGA 600 

CGGGT6GTTG G6CTTCAAAT TCG6CCCTGA ATTCCTCAAA GCAAAAAGTG CTGG6A6ACT 660 

GGTTGATTTA ATCATCTCCC GTTTCCACGA TAATGAGGAC ACTTTCTCCA GCTATGCTTT 720 

TGGCAACCAT GTG6TGTTCA CCAGGGACCC CGAGAATATC AAGGCGCTTT TGGCAACCCA 760 

GTTTGGTGAT TTTTCATTGG GCAGCAGGGT CAAGTTCTTC AAACCATTAT TGGGGTACGG 840 

TATCTTCACA TTGGACGCCG AAGGCTGGAA GCACAGCAGA 6CCATGTTGA 6ACCACAGTT 900 

TGCCA6AGAA CAAGTTGCTC ATGTQACGTC GTTGQAACCA CACTTCCAGT TGTTGAAGAA 960 

GCATATCCTT AAACACAAGG GTGAGTACTT TGATATCCAG GAATTGTTCT TTAGATTTAC 1020 

TGTCGACTCG GCCACGGAGT TCTTATTT6G TGAGTCCGTG CACTCCTTAA AGGACGAGGA 1080 

AATTGGCTAC GACACGAAAG ACATGTCTGA AQAAT^GACGC AGATTTGCCG ACGCGTTCAA 1140 

CAAGTCGCAA GTCTACGTGG CCACCAGAGT TGCTTTACAG AACTTGTACT GGTTGGTCAA 1200 

CAACAAAGA6 TTCAAGGA6T GCAATGACAT TGTCCACAA6 TTTACCAACT ACTAT6TTCA 1260 

GAAAGCCTTG GATGCTACCC CAGAGGAACT TGAAAAGCAA GGCGGGTATG TGTTCTTGTA 1320 

TGAGCTTGTC AAGCAGACGA GAGACCCCAA GGTGTTGCGT GACCAGTCTT TGAACATCTT 1380 

GTTGGCAGGA AGAGAC7VCCA CTGCTGGGTT 6TTGTCCTTT GCTGTGTTTG AGTTGGCCAG 1440 

AAACCCACAC ATCTGG6CCA AGTTGAGAGA GGAAATTGAA CAGCAGTTTG GTCTTGGAGA 1500 

AGACrCTCGT 6TT6AAGAGA TTACCTTTGA GAGCTTGAAG AGATGTGAGT ACTTGAAGGC 1560 



-34- 



wo 00/20566 



PCT/US99/207y7 



CGTGTTGAAC GAAACTTTGA GATTACACCC AAGTGTCCCA AGAAACGCAA GATTTGCGAT 1620 

TAAAGACACG ACTTTACCAA GAGGC66TGG .CCCCAACGGC AAGGATCCTA TCTTGATCA6 1680 

GAAGGATGAG GTGGTGCAGT ACTCCATCTC GGCAACTCAG ACAAATCCTG CTTATTATCG 1740 

OGCCGATGCT GCTGATTTTA GACCXSGAAAG ATGGTTTGAA CCATCAACTA GAAACTTGGG 1800 

ATGGGCTTTC TTGCCATTCA ACX3GTGGTCC AAGAATCTGT TTGGGACAAC AGTTTGCTTT 1860 

GACTGAAGCC GGTTACGTTT TGGTTAGACT TGTTCAGGAE3 TTTCCAAACT TGTCACAAGA 1920 

CCCCX3AAACC AAGTACCCAC CACCTAGATT GGCACACTTG ACGATGTGCT TGTTTGAOGG 1980 

TGCACACGTC AAGATGTCAT AGGTTrCCCC ATACAAGTAG TTCAGTAATT ATACACTOTT 2040 

TTTACTTTCT CTTCATACCA AATGGACAAA AGTTTTAAGC ATGCCTAACA ACOTGACCG6 2100 

ACAATTGTGT CGCACTAGTA TGTAACAATT GTAAAAATAG TGTACACTAA TTTGTGGTGG 2160 

CCGGAGATAA ATTACAGTTT GGTTTTCTGT AAACTCGCGG ATATCTCTGG CAGTTTCTCT 2220 

TCTCCGCAGC AGCTTTGCCA CGGGTTTGCT CTGGGGCCAA CAAATTCAAA AGGGGGAGAA 2280 

ACTTAACACC CCTTATCTCT CCACTCTAGG TTGTAGCTCT TGTGGGGATG CAATTGTOGT 2340 

ACGTTTTTTA TGTTTTGTCT AGACTTTGAT GATTACGTTG GATTTCTTAT GTCTGAGGCG 2400 

TGCTTGAAAQ AAGTGTCAAA ATGTGACAGG CGACX3CTATT CX3ACATGAAC GCX3AAAGGGT 2460 

TATTTGCATC AATACX5AGGQ GCTGACTCTA GTCTAGGATG GCAGTCCTAG GTTGCAAACA 252 0 

TGTTGCACCA TATCCCTCCT GGAGTTGGTC QACCTC6CCT ACGCCACCCT CAGCGATC6G 2580 

CACTTTCCGT TGTTCAATAT TTCTCCTTCC OITTGTTCCA GGGGTTATCA ACAACGTTGC 2640 

CGGCCTCCTC CCCAAATTAC AAGAAAAATA AATTGTCGCA CGGCACCGAT CTGTCAAAGA 2700 

TACAGATAAA CCTTAAATCT GCAAAAACAA GACCCCTCCC CATAGCCTAG AAGCACCAGC 2760 

AAGATGATGG AGCAACTCCT CCAGTACTGG TACATCGCAC TCTCTGTATG GTTCATCCTT 2820 

CGCTACTTGG CTTCCCACGC ACGAGCCGTC TACTTGOGCC ACAAGCTCGQ CGCGGCX3CCA 2880 

TTCACGCACA CCCAGTACGA CGGCTGGTAT GGGTTCAAGT TTGGGCGGGA CTTTCTCAAQ 2940 

GCGAAGAAGA TCGGGCG6CA GACGGACTTG GTGCATGCGC GGTTCCGTGG CGGCATGGAC 3000 

ACCTTCTCGA GCTACACTTT CGGCATCCAT ATCATCCTTA CCCGGGACCC GGAGAACATC 3060 

AAGGCGGTCT TGGCGACGCA GTTCGATGAC TTCTCGCTCG GTGGCAG6AT CAGGTTCTTG 3120 

AAGCCGTTGT TGGGGTATGG GATATTCAGG 33.50 

(2) INFORMATION PGR SEQ ID NO: 93: 
(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3579 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 

AAAACCGATA CAAGAAGAAG ACAGTCAACA AGAACGTTAA TGTCAACCAG GCGCCAAGAA 60 

GACGGTTTGG CGGACTTGGA AGAATGTGGC ATTTGCCCAT GATGTTTATG TTCTGGAGAG 120 

GTTTTTCAAG GAATCGTCAT CCTCCGCCAC CACAAGAACC ACCAGTTAAC GA6ATCCATA 180 

TTCACAACCC ACCGCAAGGT GACAATGCTC AACAACAACA GCAACAACAA CAACCCCCAC 240 

AAGAACAGTG GAATAATGCC AGTCAACAAA GAGTGGTGAC AGACGAGGGA GAAAACGCAA 300 

GCAACAGTGG TTCTGATGCA AGATCAGCTA CACCGCTTCA TCAGGAAAAG CAGGAGCTCC 360 

CACCACCATA TGCCCATCAC GAGCAACACC AGCAGGTTAG TGTATAGTAG TCTGTAGTTA 420 

AGTCAATGCA ATGTACCAAT AAGACTATCC CTTCTTACAA CCAAGTTTTC TGCOGCGCCT 480 

GTCTQGCAAC AGATGCTGGC CGACACACTT TCAACTGAGT TTGGTCTAGA ATTCTTGCAC 540 

ATGCACGACA AGGAAACTCT TACAAAGACA ACACTTGTGC TCTGATGCCA CTTGATCTTG 600 

CTAAGCCTTA TCAACQTAAT TGAGATCATT GTTTGTCTGA ATTATACACA CCAGTGGAAG 660 

AATCTGGTCT AATCTGCACG CCTCATGGGC ATTGTGTGTT TTGGGGGGGG GGGGGGGGGT 720 

GCACACATTT TTAGTGOGAA TGTTTGTTTG CTGGTTCCCC CTCCCCCCTC CCCCCTATCA 780 

TGCCCACAGG ATTAGTTTTT TCCTCACTGG AATTCGCTGT CCACCTGTCA ACCCCCTCAC 840 

TGCCCTGCCC TGCCCTGCAC GCCCTGTGTT TTGTGCTGTG GCACTCCCAC GCTATAAAAG 900 

CCCTGGCGTA CGGCCAAGGT TTTTCCTCAC AGCCAAAAAA AAATTTGGCT GATCCTTTTG 960 

GGCTGCAAGG TTTTTCACCA CCACCACCAC CACCACCTCA ACTATTCAAA CAAAGGATGC 1020 

TCGACCAGAT CTTCCATTAC TGGTACATTG TCTTGCCATT OTTGGTCATT ATCAAGCAGA 1080 

TCGTGGCTCA TGCCAGGACC AATTATTTGA TGAAGAAGTT GGGCGCTAAG CCATTCACAC 1140 

ATGTCCAACT AGACGGGTGG TTTGGCTTCA AATTTGGCCG TGAATTCCTC AAAGCTAAAA 1200 
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GTGCTGGGAG GCAGGTTGAT TT7UVTCATCT CCCCTTTCCA CGATAATGAG GACACTTTCT 1260 

CCAGCTATGC TTTTGGCAAC CATGTGGTGT TCACCAGGGA CCCCGAGAAT ATCAAGGCGC 1320 

TTTT66CAAC CCAGTTTGOT GATTTTTCAT TGGGAAGCAG GGTCAAATTC TTCAAACCAT 1380 

TCTTGGGGTA C6GTATCTTC ACCTTGGAC6 GCGAA6GCTG GAA6CACAGC AGAGCCATGT 1440 

TGAGACCACA GTTTGCCAGA GAGCAAGTTG CTCATGTGAC GTCGTTGGAA CCACATTTCC 1500 

AGTTGTTGAA GAAGCATATT CTTAAGCACA AGGGTGAATA CTTTGATATC CAGGAATTGT 1560 

TCTTTAGATT TACCGTTGAT TCAGCGACGG AGTTCTTATT TGGTGAGTCC GTGCACTCCT 1620 

TAA66GACX31 GGAAATTGGC TACX3ATACGA AGGACATGGC TGAAGAAAGA CGCAAATTIG 1680 

CCGACGCGTT CAACAAGTCG CAAGTCTATT TGTCCACCAG AGTTGCTTTA CAGACATTGT 1740 

ACTGGTTGOT CAACAACAAA GACHTCAAGG AGTGCAACGA CATTGTCCAC AAGTTCACCA 1800 

ACTACTATGT TCAGAAAGCC TTGGATGCTA CCCCAGAGGA ACTTGAAAAA CAAGGCGGGT 1860 

ATGTGTTCTT GTACGAGCTT GCCAAGCAGA CGAAAGACCC CAATGTGTTG CGTGACCA6T 1920 

CTTTGAACAT CTTGTTGGCT GGAAGGGACA CCACTGCTGG GTTGTTGTCC TTTGCTGTGT 1980 

TTGAGTTG6C CAGGAACCCA CACATCTGGG CCAAGTTGAG AGAGGAAATT GAATCACACT 2040 

TTGGGCTGGG TGAGGACTCT CX3TGTTGAAS AGATTACCTT TGAGABCTTG AAGAGATGTG 2100 

AGTACTTGAA AGCCGTGTTG AACGAAACGT TGAGATTACA CCCAAGTGTC CCAAGAAACG 2160 

CAAGATTTGC GATTAAAGAC ACX3ACTTTAC CAAGAGGCGG TGGCCCCAAC GGCAA6GATC 2220 

CTATCTTGAT CAGAAAGAAT GAGGTGGTGC AAXACTCCAT CTCGGCAACT CA6ACAAATC 2280 

CrGCTTATTA TGGCX3CCGAT GCTGCTGATT TTAGACCGGA AAGATCGTTT GA6CCATCAA 2340 

CTAGAAACTT GGGATGGGCTT TACTTGCCAT TCAACGGTGG TCCAAGAATC TGCTTGGGAC 2400 

AACAGTTTGC TTTGACCGAA GCCGGTTACG TTTTGGTTAG ACTTGTTCAG GAATTCCCTA 2460 

GCTTGTCACA GGACCCCGAA ACTGAGTACC CACCACCTAG ATTGGCACAC TTGACGATGT 2520 

GCTTGTTTGA CGGGGCATAC 6TCAAGAT6C AATAGGTTTT GGTTTGACTT TGTTTCCATA 2580 

TGCAAGTA6T TCAGTAATTA CACACTAATT TGTGGTGGCC GGCGATAAAT TACCGTTTGG 2640 

TTTTGTGTAA AAATTCGGAC ATCTCTGQTQ GTTTCCCTTC TCCGCAGCA3 CTTTGCCAOQ 2700 

GGTTTGCTCT GCGGCCAACA AATTCGAAAG GGGGGGGGGG GGGGGAGAAA GTTAACACCC 2760 

CCTGTTCCCA CCGTAGGCTG TA6CTCTTGT GGGGGGATGT AATTGTCGTA CGTTTTCATG 2820 

TTTGGCCCAG ACTTTGATGA TTACGTAGGC TTTCTTAT6T CTAAGGCGTG CTTGACACAA 2680 

GTGTCAAAAG GTGAGA6GC6 ACGTTATTCG ACATGAACGC AAAA6G6TAA TTTGCATCGA 2940 

TACGAGGGGT TGCCTCTGGT CTAAGAAGGA CCCCCCAGGT TGCAAACATG TTGCACTGCA 3000 

TCCCACTCAQ AGTTGGTCGA CCACGCXTTAC GCTTACCCTC AGCGATCGGC ACTTTCCGTT 3060 

GCTCAATATT TCTCTCCCCC CTGCTTCCCC CCATTGTTCC AGGGATTATC AACAACGTTG 3120 

CCGGTCTCCT CTCCCCCCCC TCCCCCCAGT TATGTACAAG AAAATTAAAT TGTCGCACGG 3180 

CACOGATACG TCAAAGATAC AGAGAAACCT TAATCXXTTCC CATAGCCTAG AAGCATCAAA 3240 

AAGATGATTG AGCAACTCCT CCAGTACTGG TACATTGCAC TCCCTGTATO GTTCATTCTC 3300 

CXXrrACGTGG CTTCCCACGC ACGAACCATC TACTT6CGCC ACAAGCTCGG CGCG6CX3CCG 3360 

TTCACGCACA CCCAGTACGA CGGATGGTAT GGGTTCAAGT TTGGGCGGGA GTTTCTCAAG 3420 

GCGAA6AA6A TTGGAAGGCA GACGGACTTG GTGCATGCGC GGTTCCGTGG AGGGGGCATG 3480 

GATACTTTCT CGAGCTATAC TTTOGGCATC CATATCATTC TTACTCGGGA CCCGGAGAAC 3540 

ATCAAGGCGC TCTTGGCGAC GCAGTTCGAT GACTTTTCG 3579 

(2) INFORMATION FOR SEQ ID NO: 94: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3348 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOIXX3Y: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 

GATGTGGTGC TTGATTTCTC GAGACACATC CTTGTGAGGT GCCATGAATC TGTACCTGTC 60 

TGTAA6CACA GGGAACTGCT TCAACACCTT ATTGCATATT CTGTCTATTG CAAGCGTGTG 120 

CTGCAACGAT ATCTGCCAAG GTATATAGCA GAACGTGCTG ATGGTTCCTC CGGTCATATT 180 

CTGTTGGTAG TTCTGCAGGT AAATTTGGAT GTCAGGTAGT GGAGGGAGGT TTGTATCGGT 240 

TOTQTTTTCT TCTTCCTCTC TCTCTGATTC AACCTCCACG TCTCCTTCGG GTTCT6TGTC 300 

TGTGTCTGAO TCGTACTGTT GGATTAAGTC CATCGCAT6T GTGAAAAAAA GTAGCGCTTA 360 

TTTAGACAAC CAGTTC6TTG GGCGGGTATC AGAAATAGTC TGTTGTGCAC GACCATGAGT 420 
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ATGCAACTTG ACGAGACGTC GTTA6GAATC CACA6AATGA TAGCAGGAA6 CTTACTACGT 480 

GAGAGATTCT GCTTAGAGGA TCTTCTCTTC TTGTTGATTC CATTAGGTGG GTATC3VTCTC 540 

CGGTG6TGAC AACTTGACAC AAGCAGTTCC GASAACXACC CACAACAATC ACCATTCCAG 600 

CTATCACTTC TACAT6TCAA CCTACX3AT6T ATCTCATCAC CATCTAGTTT CTTGGCAArC 660 

cyrrmu ' iXJT tatgggtcaa catccaaxac aactccacca atgjuusaaga aaaacggaaa 720 

GCAGAATACC AGAATGACAG TGTGAGTTCC TGACCATTGC TAATCTATGG CTATATCTA6 780 

TTTGCTATCX5 TGGGATGTGA TCTGTGTCGT CTTCATTTCC G'mXi'i'U ' rn ' ATTTCGGGTA 840 

TGAATATT6T TATACTAAAT ACTTGAT6CA CAAACAI6GC GCTCGASAAA TCGAGAATGT 900 

GAICAACGAT 6GGTTCTTTG GG^TCCGCTT ACCTTTGCTA CTCAT6GGA6 CCA6CAATGA 960 

GG6C0GACTT ATCGAGITCA GTGTCAAGAG ATTCGAGTCG GCGCCACATC CACASAACAA 1020 

GACATTGGTC AACCGG6CAT TGAGCGTTCC TGTGATACTC ACCMGGACC CAGTGAATAT 1080 

CAAAGCGATG CTATCGACCC AGTTTGATGA CTTTTCCCTT GGGTTGAGAC TACACCAGTT 1140 

TGCGCC6TTG TTGGGGAAAG 6CATCTTTAC TTTGGAC6GC CCAGA6TGGA A6CA6A8CCG 1200 

ATCTATGTTG C6TCCGCAAT TT6CCAAAQA TCGGGTTTCT CATATCCTGG ATCTAGAACC 1260 

GCATTTTGTO TTGCTTCGGA AGCACATTGA TGGCCACAAT GGAGACTACT TCGACATCCA 1320 

GGAGCTCTAC TTCCG6TTCT CGATGGATGT GGCGACGGGG rX ' ri ' XXJ ' m ' G GCGAGTCTGT 1380 

GGGGTCGTTG AAAGACGAAG ATGCGAGGTT CCTGGAAGCA TTCAATGAC3T CGCAGAAGTA 1440 

TTTGGCAACT AGGGCAACGT TGCACGAGTT GTACTTTCTT TGTGACGGGT TTAGGTTTCG 1500 

CCA6TACAAC AAGGTTGTGC GAAAGTTCTG CAGCCASTGT GTCCACAAGG CGTIAGATGT 1560 

T6CACCGGAA GACACCA6C6 A6TACGTGTT TCTCC6GGAG TTGGTCAAAC ACACTCGA6A 1620 

TCCCXrrTGTT TTACAAQACC AAGCGTTGAA CGTCTTGCTT 6CTGGAC6CG ACACCACC6C 1680 

GTCGTTATTA TCGTTTGCAA CATTTGAGCT AGCCCGGAAT GACCACATGT GGAGGAAGCT 1740 

ACGASAGGAG GTTATCCTGA CGATGGGACC GTCCAGTGAT GAAATAACCG TGGCCGGGTT 1800 

GAAGAGTTGC CGTTACCTCA AAC3CAATCCT AAACGAAACT CTTCGACTAT ACCGAAGrTGT 1860 

GCCTAGGAftC GGGAGATTTG CTACGA6GAA TACX3ACGCTT CCTCXTIGGCG GAGGTCCAQA 1920 

TOGATCGTTT CCGATTTTGA TAAGAAAGGG CCAGCCA6TG GGGTATTTCA TTTGTGCTAC! 1980 

ACACTTGAAT GASAAGGTAT ATGGGAATGA TA6CCATGTG TTTCGACCGG A6AGATGGGC 2040 

TGCXyrTAGAG GGCAAGAGTT TGGGCTGGTC GTATCTTCCA TTCAACGGCG 6CCCGAGAAG 2100 

CTGCCTTGGT CAGCAGTrTTG CAATCCTTGA AGCTTCGTAT GTTTTGGCTC GATTGACACA 2160 

OTGCTACACG AC6ATACA6C TTAOAACTAC CGAGTACCCA CCAAAGAAAC TCGTTCATCT 2220 

GACGATQAGT C^CTCAACG GGGTGTACAT CCQAACTAGA ACTTGATTAT 6TGTTTATG6 2280 

TTAATCGGGG CAAAGCACTG CAAGTCATTG ATGTTTGTGG AAGCCCAGCA TTGGTGTTCC 2340 

GGAGCATCAA TAACCAATGT CTTGAAGGGT TTGATTTTCT TGACCTTCTT CTTCCTGAGC 2400 

TTCTTTCCOT CAAACTTGTA CAGAATGGCC ATCATTTCAG GAACAACCAC GTACGACGGC 2460 

CGGTACCGCA TCTGGAGTAT CTC6CCGTCG TTCAAGTAGC ACX3AAAACAG CAACGACGTC 2520 

ACCATCTGCT TCCCAATCTT GACACCCACA GATACCCCTG CGGCTTCATG GATCAAATkAC 2580 

GT06GCAACC CCGCGTATAT GTCCATGTAA TTCTCCATGG CCACCTCCAT CAACACACTG 2640 

ATGGAGCGAC TGACGGTGCC ACCACTGCCC TCGGTTGA6T CAAGGCAGTA TGATGCCGGG 2700 

ATCCAGTACT CCAATGGGAA CCTCTGCACG GTGTCGCTGC AGTTTTTGAG GCGTATTTCG 2760 

ATCCATGATC GTTCTTTGGT GCTGTAGTAT AACGAGCTCT TGGTGTCCTT GAAATGGAAC 2820 

AGGTTG6ATG TGTTGTTGA6 TTTGTCTGCG TGCTTG6TTT 6CAAGTCTTC QATCGAGCGT 2880 

AOTQAGTAGA CAGTTGG066 GGGTGGTG6C TCGGGCTTTA TTCTGTGTTT 6TGTTTCCTT 2940 

CTTAGTCrrG GAATGACGCT GTTATCGAC6 GTTCGTAGTA TAA6TA6CGC CAATATGAGA 3000 

ATGTATATCC GCATCACCCA AGACTCTTCA GCCTGTTACA ACGACTGAGG CTGTTGGCCG 3060 

TGTGACCAAT TGGTTTCTTT GGT6ACCTA6 ATTGGTCCCG CAGGGAAAGC AAGGGCTGCT 3120 

AGGGGGGCAT ACCAAACAAG GTC6TGTAAT CA6TATCTAT GGTOCTACCA TGTGTGTGGT 3180 

TGGGG6QAAA TTCCCGGATT TTTGTGTAAC GAAAGTTCTA GAAAGTTCTC GTGGGTTCTG 3240 

AGAATCTGCT GGAACCATCC ACCCGCATTT CCGTTGCCAA AGTGGGAAGA GCAATCAACC 3300 

CACCCT6CTT TGCCCAATCA 6CCATTCCCC TGGGAATATA AATTCAAC 3348 

(2) IXtFGBMKriW FOR SEQ ID NO:95: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 523 £unino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 
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(ii) MOLECDLE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SBQ 



Met 


Ala 


Thr 


Gin 


Glu 


He 


He 


Asp 


X 








5 








Trp 


Tyr 


Thr 


Val 


He 


Thr 


Ala 


Ala 








20 










Asn 


lie 


Lys 


Asn 


Tyr 


Val 


Lys 


Ala 






35 










40 


Pro 


Tyr 


Leu 


Lys 


Asp 


Ala 


Gly 


Leu 




50 










55 




Ala 


lie 


Lys 


Ala 


Lys 


Asn 


Asp 


Gly 


65 










70 






Val 


Phe 


Asp 


Glu 


Tyr 


Pro 


Asn 


His 










85 








Ala 


Leu 


Lys 


lie 


Val 


Met 


Thr 


Val 








100 










Leu 


Ala 


Thr 


Gin 


Phe 


Thr 


Asp 


Phe 






115 










120 


Phe 


Ala 


Pro 


Leu 


Leu 


Gly 


Asp 


Gly 




130 










135 




Trp 


Lys 


His 


Ser 


Arg 


Ala 


Met 


Leu 


145 










150 






lie 


Gly 


His 


Val 


Lys 


Ala 


Leu 


Glu 










165 








Gin 


lie 


Lys 


Leu 


Asn 


Gin 


Gly 


Lys 








180 










Phe 


Arg 


Phe 


Thr 


Val 


Asp 


Thr 


Ala 






195 










200 


Val 


His 


Ser 


Leu 


Tyr 


Asp 


Glu 


Lys 




2X0 










215 




lie 


Pro 


Gly 


Arg 


Glu 


Asn 


Phe 


Ala 


225 










230 






Tyr 


Leu 


Ala 


Thr 


Arg 


Ser 


Tyr 


Ser 










245 








Pro 


Lys 


Glu 


Phe 


Arg 


Asp 


Cys 


Asn 








260 










Tyr 


Phe 


Val 


Asn 


Lys 


Ala 


Leu 


Asn 






275 










280 


Lys 


Ser 


Lys 


Ser 


Gly 


Tyr 


Val 


Phe 




290 










295 




Arg 


Asp 


Pro 


Lys 


Val 


Leu 


Gin 


Asp 


305 










310 






Oly 


Arg 


Asp 


Thr 


Thr 


Ala 


Gly 


Leu 










325 








Ala 


Arg 


His 


Pro 


Glu 


Met 


Trp 


Ser 








340 










Asn 


Phe 


Gly 


Val 


Gly 


Glu 


Asp 


Ser 






355 










360 


Ala 


Leu 


Lys 


Arg 


Cys 


Glu 


Tyr 


Leu 




370 










375 




Arg 


Met 


Tyr 


Pro 


Ser 


Val 


Pro 


Val 


365 










390 






Thr 


Thr 


Leu 


Pro 


Arg 


Gly 


Gly 


Gly 










405 








lie 


Pro 


Lys 


Gly 


Ser 


Thr 


Val 


Ala 



420 



NO:95: 



Ser Val 


Leu 


Pro 


Tyr 


Leu 


Thr 


Lys 


XO 










15 




Val lieu 


Val 


Pne 


Leu 


He 


Ser 


Thr 


25 








O A 

30 






Lys Lys 


Leu 


Lys 


Cys 


Val 


Asp 


Pro 








45 








Thr Gly 


He 


Leu 


Ser 


Leu 


He 


Ala 






oO 










Arg Leu 


Ala 


Asn 


Phe 


Ala 


Asp 


Glu 




75 










80 


Thr Phe 


Tyr 


Leu 


Ser 


Val 


Ala 


Gly 


90 










95 




Asp Pro 


Glu 


Asn 


He 


Lys 


Ala 


Val 


105 








XXO 






Ser Leu 


Gly 


Thr 


Arg 


His 


Ala 


His 








125 








He Phe 


Thr 


Leu 


Asp 


Gly 


Glu 


Gly 






140 










Arg* Pro 


Gin 


Phe 


Ala 


Arg 


Asp 


Gin 




155 










160 


Pro His 


He 


Gin 


He 


Met 


Ala 


Lys 


170 










175 




Thr Phe 


Asp 


He 


Gin 


Glu 


Xjeu 


Phe 


185 








X90 






Thr Glu 


Phe 


Leu 


Phe 


Gly 


Glu 


Ser 








205 








Leu Gly 


He 


Pro 


Thr 


Pro 


Asn 


Glu 






220 










Ala Ala 


Phe 


Asn 


Val 


Ser 


Gin 


His 




235 










240 


Gin Thr 


Phe 


Tyr 


Phe 


Leu 


Thr 


Asn 


250 










255 




Ala Lys 


Val 


His 


His 


Leu 


Ala 


Lys 


265 








270 






Phe Thr 


Pro 


Glu 


Glu 


X^U 


Glu 


Glu 








285 








Leu Tyr 


Glu 


Leu 


Val 


Lys 


Gin 


Thr 






300 










Gin Leu 


Leu 


H. rm mt 

Asn 


lie 


Met 


val 


Ala 
















Leu Ser 


Pne 


Ala 


Ijeu 


Pne 


Glu 


lieu 


3311 














Lys Leu 


Arg 


Glu 


Glu 


lie 


Glu 


vai 


349 














Arg Val 


Glu 


Glu 


He 


Thr 


Pne 


Glu 








365 








Lys Ala 


He 


Leu 


Asn 


Glu 


Thr 


lieu 






380 










Asn Phe 


Arg 


Thr 


Ala 


Thr 


Arg 


Asp 




395 










400 


Ala Asn 


Gly 


Thr 


Asp 


Pro 


He 


Tyr 


410 










415 




Tyr Val 


Val 


Tyr 


Lys 


Thr 


His 


Arg 


425 








430 







38- 



wo 00/20566 



PCTAJS99/20797 



Leu 


Glu Glu 


Tyr 


Tyr 


Gly 


Lys 


Asp 


Ala Asn Asp Phe Arg 


Pro 


Glu Arg 




435 










440 


445 






Trp 


Phe Glu 


Pro 


Ser 


Thr 


Lys 


Lys 


Leu Gly Trp Ala Tyr 


vai 


Pro Phe 




450 








455 




460 






Asn 


Gly Gly 


Pro 


Arg 


val 


Cys 


Leu 


Gly Gin Gin Pne Ala 


Leu 


Thr Glu 


465 








470 






475 




480 


Ala 


Ser Tyr 


Val 


lie 


Thr 


Arg 


Leu 


Ala Gin Met Phe Glu 


Thr 


Val Ser 








485 








490 




495 


Ser 


Asp Pro 


Gly 


Leu 


Glu 


Tyr 


Pro 


Pro Pro Lys Cys lie 


His 


Leu Thr 






500 










505 


510 




Met 


Ser His 


Asn 


Asp 


Gly 


Val 


Phe 


Val Lys Met 








515 










520 









INFORMATION FOR SEQ ID NO: 96: 
(i) SEQX7ENCE CHARACTERISTICS: 

(A) LENGTH: 522 amino acids 

<B) TYPE: amino acid 

<C) STRANDEDNESS : single 

<D) TOPOLOGY: unJcnown 
(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 



Met Thr 


Val 


His 


Asp 


He 


He Ala Thr Tyr Phe Thr Lys 


Trp 


Tyr Val 


1 






5 




10 




15 


He Val 


Pro 


Leu 


Ala 


Leu 


He Ala Tyr Arg Val. Leu Asp 


Tyr 


Phe Tyr 






20 






25 


30 




Gly Arg 


Tyr 


Leu 


Met 


Tyr 


Lys Leu Gly Ala Lys Pro Phe 


Phe 


Gin Lys 




35 








40 45 






Gin Thr 


Asp 


Gly 


Cys 


Phe 


Gly Phe Lys Ala Pro Leu Glu 


Leu 


Leu Lys 


50 










55 60 






Lys Lys 


Ser 


Asp 


Gly 


Thr 


Leu He Asp Phe Thr Leu Gin 


Arg 


He His 


65 








70 


75 




80 


Asp Leu 


Asp 


Arg 


Pro 


Asp 


He Pro Thr Phe Thr Phe Pro 


Val 


Phe Ser 








85 




90 




95 


He Asn 


Leu 


Val 


Asn 


Thr 


Leu Glu Pro Glu Asn He Lys 


Ala 


He Leu 






100 






105 


110 




Ala Thr 


Gin 


Phe 


Asn 


Asp 


Phe Ser Leu Gly Thr Arg His 


Ser 


His Phe 




115 








120 125 






Ala Pro 


Leu 


Leu 


Gly 


Asp 


Gly He Phe Thr Leu Asp Gly 


Ala 


Gly Trp 


130 










135 140 






Lys His 


Ser 


Arg 


Ser 


Met 


Leu Arg Pro Gin Phe Ala Arg 


Glu 


Gin He 


145 








150 


155 




160 


Ser His 


Val 


Lys 


Leu 


Leu 


Glu Pro His Val Gin Val Phe 


Phe 


Lys His 








165 




170 




175 


Val Arg 


Lys 


Ala 


Gin 


Gly 


Lys Thr Phe Asp He Gin Glu 


Leu 


Phe Phe 






180 






185 


190 




Arg Leu 


Thr 


Val 


Asp 


Ser 


Ala Thr Glu Phe Leu Phe Gly 


Glu 


Ser Val 




195 








200 205 






Glu Ser 


Leu 


Arg 


Asp 


Glu 


Ser He Gly Met Ser He Asn 


Ala 


Ijeu Asp 


210 










215 220 






Phe Asp 


Gly 


Lys 


Ala 


Gly 


Phe Ala Asp Ala Phe Asn Tyr 


Ser 


Gin Asn 


225 








230 


235 




240 


Tyr Leu 


Ala 


Ser 


Arg 


Ala 


Val Met Gin Gin Leu Tyr Trp 


Val 


Leu Asn 








245 




250 




255 


Gly Lys 


Lys 


Phe 


Lys 


Glu 


Cys Asn Ala Lys Val His Lys 


Phe 


Ala Asp 






260 






265 


270 
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Tvr 


Val 


Ash 


Lys 


Ala 




Tjou Thir Piro 


Glu Gin Leu 


Glu 
















?Rft 
^ 0 w 












Asp 


o±y 


lyr 


Val 






Rill T.011 Val 
UIXU JJfSU VdX 




Aig 


Asp 




9 OA 














3Aft 












i«eu 




ASp 




XJ6U nBu xxe 


MAt^ Val Ala 
vox Mxa 


taxy 


Arg 












■^1 ft 




lit; 






30 A 


Asp 


Jinr 


lor 






x<eu 


x<6u Ser 


PVio Val PVto 
IriXc VcLX irXlC 


IrUC \7XU J^CU 


Ala 


Arg 




















335 




Ash 




(It 11 


VAX 








Al ti Rill 
mX^^ uxu 


Tl A Rl 11 2Vfsn 

XXC UXU 












340 








345 


350 








Leu 






Ash 




der vax 


uxu Asp xxe 


C Ai^ Dt^ A #21 11 
aer iriie uxu 


Cat* 


lieu 














JO v 




365 






Lys 


oer 


Cys 


ur±u 


Tyr 


Leu 


jiys Axa 


VaI T.Ait &an 

vax iiBu ASH 


/21 11 T'^tr T.A11 
\jXU ICUt IjcU 


Arg 


Leu 
























Tyr 


Fro 


Ser 


vax 


Fro 


uJLXX 


Asn Phe 


Arg vax Axa 


ixir jjys ASu 


Tnr 


Tbr 


HOC 














3QI? 






AAA 
4UU 


Leu 


Pro 


Arg 


Gly 


Gly 


Gly 


Lys T^p 


Gly Leu Ser 


Fro vax lieu 


vax 


Arg 










A nc 






Al A 
4XU 




Al R 
4X9 




Lys 


Gly 


6Xn 


Thr 


vax 


xxe 


Tyr Gly 


vax Tyr Axa 


Vil ^ aj M 1V-M-J-W 

Ala Hxs Arg 


Asn 


Fro 








42Q 








AO C 

429 


A 

44U 






Axa 


vaij. 


Tyr 


Gly 


Lys 


Asp 


Ala Leu 


Glu Phe Arg 


Pro Glu Arg 


Trp 


Fne 






^ e 








AAA 




449 






Glu 


Fro 


GXU 


Tbr 


Liys 


Lys 


Leu Gly 


Trp Axa Fne 


Ajeu Fro Fne 


Asn 


r*i 
isxy 














ACQ 




460 






Oly 


Pro 


Arg 


He 


Cys 


Leu 


Gly Gin 


Gin Phe Ala 


Leu Thr Glu 


Ala 


Ser 


465 










470 




475 






480 


Tyr 


Val 


Thr 


Val 


Arg 


Leu 


Leu Gin 


Glu Phe Ala 


His Leu Ser 


Met 


Asp 










485 






490 




495 




Pro 


Asp 


Thr 


Glu 


Tyr 


Pro 


Pro Lys 


Lys Net Ser 


His Leu Thr 


Met 


Ser 








500 








505 


510 






Leu 


Phe 


Asp 


Gly 


Ala 


Asn 


He Glu 


Met Tyr 









515 520 

(2) INFORMATION FOR SEQ ID NO: 97: 
(i) SEQX3ENCE CHARACTERISTICS: 

(A) LENGTH: 522 amino acids 

(B) TYPE: amino acid 

(C) STRAMDEDNESS : single 

(D) TOPOLOGY: unkno%m 
(ii) MOLECDLE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 



Met 


Thr 


Ala 


Gin 


Asp 


He 


He 


Ala 


Thr Tyr He 


Thr Lys Trp 


Tyr 


Val 


1 








5 








10 




15 




He 


Val 


Pro 


Leu 


Ala 


Leu 


He 


Ala 


Tyr Arg Val 


Leu Asp Tyr 


Phe 


Tyr 








20 










25 


30 






Oly 


Arg 


Tyr 


Xieu 


Met 


Tyr 


Lys 


Leu 


Gly Ala Lys 


Pro Phe Phe 


Gin 


Lys 






35 










40 




45 






Gin 


Thr 


Asp 


Gly 


Tyr 


Phe 


Gly 


Phe 


Lys Ala Pro 


Leu Glu Leu 


lieu 


Lys 




50 










55 






60 






Lys 


Lys 


Ser 


Asp 


Gly 


Thr 


I<eu 


He 


Asp Phe Thr 


Leu Glu Arg 


He 


Gin 


65 










70 






75 






80 


Ala 


Leu 


Asn 


Arg 


Pro 


Asp 


He 


Pro 


Thr Phe Thr 


Phe Pro He 


Phe 


Ser 










85 








90 




95 




He 


Asn 


Leu 


He 


Ser 


Thr 


Leu 


Glu 


Pro Glu Asn 


He Lys Ala 


He 


lieu 








100 










105 


110 
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K 1 

AXa 


Thr 


Gin 


Phe 


Asn Asp 


Phe 


Ser 


Leu 


Gly 


Thr 


Arg 


His 


Ser 


His 


Phe 






115 








120 










125 








Ala 


Pro 


I^eu 


Leu 


Gly Asp 


Gly 


lie 


Phe 


Thr 


Leu 


Asp 


Gly 


Ala 


Gly 


Trp 




130 








135 










140 










Lys 


His 


Ser 


Arg 


Ser Met 


Leu 


Arg 


Pro 


Gin 


Phe 


Ala 


Arg 


Glu 


Gin 


He 


145 








150 










155 










160 


Ser 


His 


Val 


Lys 


Leu Leu 


Glu 


Pro 


Bis 


Met 


Gin 


Val 


Phe 


Phe 


Lys 


His 










165 








170 










175 




Val 


Arg 


Lys 


Ala 


Gin Gly 


Lys 


Thr 


Phe 


Asp 


He 


Gin 


Glu 


Leu 


Phe 


Phe 








180 








185 










190 






Arg 


Leu 


Thr 


Val 


Asp Ser 


Ala 


Thr 


Glu 


Phe 


Leu 


Phe 


Gly 


Glu 


Ser 


Val 






195 








200 










205 








Glu 


Ser 


Leu 


Arg 


Asp Glu 


Ser 


He 


Gly 


Met 


Ser 


He 


Asn 


Ala 


Leu 


Asp 




210 








215 










220 










Phe 


Asp 


Gly 


Lys 


Ala Gly 


Phe 


Ala 


Asp 


Ala 


Phe 


Asn 


Tyr 


Ser 


Gin 


Asn 


225 








230 










235 










240 


Tyr 


Leu 


Ala 


Ser 


Arg Ala 


Val 


Met 


Gin 


Gin 


Leu 


Tyr 


Trp 


Val 


Leu 


Asn 










245 








250 










255 




Gly 


Lys 


Lys 


Phe 


Lys Glu 


Cys 


Asn 


Ala 


Lys 


Val 


His 


Lys 


Phe 


Ala 


Asp 








260 








265 










270 






Tyr 


Tyr 


Val 


Ser 


Lys Ala 


Leu 


Asp 


Leu 


Thr 


Pro 


Glu 


Gin 


Leu 


Glu 


Lys 






275 








280 










285 








Gin 


Asp 


Gly 


Tyr 


Val Phe 


Leu 


Tyr 


Glu 


Leu 


Val 


Lys 


Gin 


Thr 


Arg 


Asp 




290 








295 










300 










Arg 


Gin 


Val 


Leu 


Arg Asp 


Gin 


Leu 


Leu 


Asn 


lie 


Met 


Val 


Ala 


Gly 


Arg 


305 








310 










315 










320 


Asp 


Thr 


Thr 


Ala 


Gly Leu 


Leu 


Ser 


Phe 


Val 


Phe 


Phe 


Glu 


lieu 


Ala 


Arg 










325 








330 










335 




Asn 


Pro 


Glu 


Val 


Thr Asn 


Lys 


Leu 


Arg 


Glu 


Glu 


He 


Glu 


Asp 


Lys 


Phe 








340 








345 










350 






Gly 


Leu 


Gly 


Glu 


Asn Ala 


Arg 


Val 


Glu 


Asp 


He 


Ser 


Phe 


Glu 


Ser 


Leu 






355 








360 










365 








Lys 


Ser 


Cys 


Glu 


Tyr Leu 


Lys 


Ala 


Val 


Leu 


Asn 


Glu 


Thr 


Leu 


Arg 


Leu 




370 








375 










380 










Tyr 


Pro 


Ser 


Val 


Pro Gin 


Asn 


Phe 


Arg 


Val 


Ala 


Thr 


Lys 


Asn 


Thr 


Thr 


365 








390 










395 










400 


Leu 


Pro 


Arg 


Gly 


Gly Gly 


Lys 


Asp 


Gly 


Leu 


Ser 


Pro 


Val 


Leu 


Val 


Arg 










405 








410 










415 




Lys 


Gly 


Gin 


Thr 


Val Met 


Tyr 


Gly 


Val 


Tyr 


Ala 


Ala 


His 


Arg 


Asn 


Pro 








420 








425 










430 






Ala 


Val 


Tyr 


Gly 


Lys Asp 


Ala 


Leu 


Glu 


Phe 


Arg 


Pro 


Glu 


Arg 


Trp 


Phe 






435 








440 










445 








Glu 


Pro 


Glu 


Thr 


Lys Lys 


Leu 


Gly 


Trp 


Ala 


Phe 


Leu 


Pro 


Phe 


Asn 


Gly 




A C A 


















4bU 










Gly 


Pro 


Arg 


He 


Cys Leu 


Gly 


Gin 


Gin 


Phe 


Ala 


Leu 


Thr 


Glu 


Ala 


Ser 


465 








470 










475 










480 


Tyr 


Val 


Thr 


Val 


Arg Leu 


Leu 


Gin 


Glu 


Phe 


Gly 


His 


Leu 


Ser 


Met 


Asp 










485 








490 










495 




Pro 


Asn 


Thr 


Glu 


Tyr Pro 


Pro 


Arg 


Lys 


Met 


Ser 


His 


Leu 


Thr 


Met 


Ser 








500 








505 










510 






IjOU 


Phe 


Asp 


Gly 


Ala Asn 


lie 


Glu 


Met 


Tyr 















515 520 



(2) INFORMATION FOR SEQ ID NO: 98: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 540 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknotm 
(ii) MOLBCDLB TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:98: 



Met 


Ser 


Ser 


Ser 


Pro 


Ser 


Phe 


Ala 


Gin Glu Val Leu Ala Tor 


Thr 


Ser 


1 








5 








10 


15 




Pro 


Tyr 


lie 


Glu 


Tyr 


Phe 


Leu 


Asp 


Asn Tyr Thr Arg Trp Tyr 


Tyr 


Phe 








20 










25 30 






lie 


Pro 


Leu 


Val 


Leu 


Leu 


Ser 


Leu 


Asn Phe lie Ser Leu Leu 


His 


Thr 






35 










40 


45 






Arg 


Tyr 


Leu 


Glu 


Arg 


Arg 


Phe 


His 


Ala Lys Pro Leu Gly Asn 


Phe 


Val 




50 










55 




60 






Arg 


Asp 


Pro 


Thr 


Phe 


Gly 


lie 


Ala 


Thr Pro Leu Leu Leu lie 


Tyr 


Leu 


65 










70 






75 




80 


Lys 


Ser 


Lys 


Gly 


Thr 


Val 


Met 


Lys 


Phe Ala Trp Gly Leu Trp 


Asn 


Asn 










85 








90 


95 




Lys 


Tyr 


lie 


Val 


Arg 


Asp 


Pro 


Lys 


Tyr Lys Thr Thr Gly I^eu 


Arg 


He 








100 










105 110 






Val 


Gly 


Leu 


Pro 


Leu 


lie 


Glu 


Thr 


Met Asp Pro Glu Asn lie 


Lys 


Ala 






115 










120 


125 






Val 


Ijeu 


Ala 


Thr 


Gin 


Phe 


Asn 


Asp 


Phe Ser lieu Gly Thr Arg 


His 


Asp 




130 










135 




140 






Phe 


Leu 


Tyr 


Ser 


Leu 


Leu 


Gly 


Asp 


Gly He Phe Thr Leu Asp 


Gly 


Ala 


145 










150 






155 




160 


Gly 


Trp 


Lys 


His 


Ser 


Arg 


Thr 


Met 


Leu Arg Pro Gin Phe Ala 


Arg 


Glu 










165 








170 


175 




Gin 


Val 


Ser 


His 


Val 


Lys 


Leu 


Leu 


Glu Pro His Val Gin Val 


Phe 


Phe 








180 






1 




185 190 






Lys 


His 


Val 


Arg 


Lys 


His 


Arg 


Gly 


Gin Thr Phe Asp He Gin 


Glu 


Leu 






195 










200 


205 






Phe 


Phe 


Arg 


Leu 


Thr 


Val 


Asp 


Ser 


Ala Thr Glu Phe Leu Phe 


Gly 


Glu 




210 










215 




220 






Ser 


Ala 


Glu 


Ser 


Leu 


Arg 


Asp 


Glu 


Ser He Gly Leu Thr Pro 


Thr 


Thr 


225 










230 






235 




240 


Lys 


Asp 


Phe 


Asp 


Gly 


Arg 


Arg 


Asp 


Phe Ala Asp Ala Phe Asn 


Tyr 


Ser 










245 








250 


255 




Gin 


Thr 


Tyr 


Gin 


Ala 


Tyr 


Arg 


Phe 


Leu Leu Gin Gin Met Tyr 


Trp 


He 








260 










265 270 






Leu 


Asn 


Gly 


Ser 


Glu 


Phe 


Arg 


Lys 


Ser He Ala Val Val His 


Lys 


Phe 






275 










280 


285 






Ala 


Asp 


His 


Tyr 


Val 


Gin 


Lys 


Ala 


Leu Glu Leu Thr Asp Asp 


Asp 


Leu 




290 










295 




300 






Gin 


Lys 


Gin 


Asp 


Gly 


Tyr 


Val 


Phe 


Leu Tyr Glu Leu Ala Lys 


Gin 


Thr 


305 










310 






315 




320 


Arg 


Asp 


Pro 


Lys 


Val 


Leu 


Arg 


Asp 


Gin Leu Leu Asn He Leu 


Val 


Ala 










325 








330 


335 




Gly 


Arg 


Asp 


Thr 


Thr 


Ala 


Gly 


Leu 


Leu Ser Phe Val Phe Tyr 


Glu 


Leu 








340 










345 350 






Ser 


Arg 


Asn 


Pro 


Glu 


Val 


Phe 


Ala 


Lys Leu Arg Glu Glu Val 


Glu 


Asn 






355 










360 


365 






Arg 


Phe 


Gly 


Leu 


Gly 


Glu 


Glu 


Ala 


Arg Val Glu Glu He Ser 


Phe 


Glu 




370 










375 




380 






Ser 


Leu 


Lys 


Ser 


Cys 


Glu 


Tyr 


Leu 


Lys Ala Val He Asn Glu 


Thr 


Leu 


385 










390 






395 




400 
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Arg Leu Tyr Pro Ser Val Pro His Asn Phe Arg Val Ala Thr Arg Asn 

405 410 415 

Thr Thr Leu Pro Arg Gly Gly Gly Glu Asp Gly Tyr Ser Pro He Val 

420 425 430 

Val Lys Lys Gly Gin Val Val Met Tyr Thr Val He Ala Thr His Arg 

435 440 445 

Asp Pro Ser He Tyr Gly Ala Asp Ala Asp Val Phe Arg Pro Glu Arg 

450 455 460 

Trp Phe Glu Pro Glu Thr Arg Lys Leu Gly Trp Ala Tyr Val Pro Phe 
465 470 475 480 

Asn Gly Gly Pro Arg He Cys Veu Gly Gin Gin Phe Ala Leu Thr Glu 

485 490 495 

Ala Ser Tyr Val Thr Val Arg Leu Leu Gin Glu Phe Ala His Leu Ser 

500 505 510 

Met Asp Pro Asp Thr Glu Tyr Pro Pro Lys Leu Gin Asn Thr Leu Thr 

515 520 525 

Leu Ser Leu Phe Asp Gly Ala Asp Val Arg Met Tyr 
530 535 540 



INFORMATION FOR SEQ ID NO: 99: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 540 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNBSS : single 

(D) TOPOLOGY: unJcnovm 
(ii) MOLECDLE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 



Met 


Ser 


Ser 


Ser 


Pro 


Ser 


Phe 


Ala 


Gin Glu 


Val Leu 


Ala 


Thr 


Thr 


Ser 


1 








5 








10 








15 




Pro 


Tyr 


He 


Glu 
20 


Tyr 


Phe 


Leu 


Asp 


Asn Tyr 
25 


Thr Arg 


Trp 


Tyr 
30 


Tyr 


Phe 


He 


Pro 


Leu 
35 


Val 


Leu 


Leu 


Ser 


Leu 
40 


Asn Phe 


He Ser 


Leu 
45 


Leu 


His 


Thr 


Lys 


Tyr 
50 


Leu 


Glu 


Arg 


Arg 


Phe 
55 


His 


Ala Lys 


Pro Leu 
60 


Gly 


Asn 


val 


val 


Leu 


Asp 


Pro 


Thr 


Phe 


Gly 


He 


Ala 


Thr Pro 


Leu He 


Leu 


He 


Tyr 


Leu 


65 










70 








75 








80 


Lys 


Ser 


Lys 


Gly 


Thr 
85 


Val 


Met 


Lys 


Phe Ala 
90 


Trp Ser 


Phe 


Trp 


Asn 
95 


Asn 


Lys 


Tyr 


He 


Val 
100 


Lys 


Asp 


Pro 


Lys 


Tyr Lys 
105 


Thr Thr 


Gly 


Leu 
110 


Arg 


He 


Val 


Gly 


Leu 
115 


Pro 


Leu 


He 


Glu 


Thr 
120 


He Asp 


Pro Glu 


Asn 
125 


He 


Lys 


Ala 


Val 


I«eu 
130 


Ala 


Thr 


Gin 


Phe 


Asn 
135 


Asp 


Phe Ser 


Leu Gly 
140 


Thr 


Arg 


His 


Asp 


Phe 


Leu 


Tyr 


Ser 


Leu 


Leu 


Gly 


Asp 


Gly He 


Phe Thr 


Leu 


Asp 


Gly 


Ala 


145 










150 








155 








160 


Gly 


Trp 


Lys 


His 


Ser 
165 


Arg 


Thr 


Met 


Leu Arg 
170 


Pro Gin 


Phe 


Ala 


Arg 
175 


Glu 


Gin 


Val 


Ser 


His 
180 


Val 


Lys 


Leu 


Leu 


Glu Pro 
185 


His Val 


Gin 


Val 
190 


Phe 


Phe 


Lys 


His 


Val 
195 


Arg 


Lys 


His 


Arg 


Gly 
200 


Gin Thr 


Phe Asp 


He 
205 


Gin 


Glu 


Leu 


Phe 


Phe 
210 


Arg 


Leu 


Thr 


Val 


Asp 
215 


Ser 


Ala Thr 


Glu Phe 
220 


Leu 


Phe 


Gly 


Glu 
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Ser Ala Glu Ser Leu Arg Asp Asp Ser Val Gly Leu Thr Pro Thr Thr 
225 230 235 240 

Lys Asp Phe Glu Gly Arg Gly Asp Phe Ala Asp Ala Phe Asn Tyr Ser 

245 250 255 

Gin Thr Tyr Gin Ala Tyr Arg Phe Leu Leu Gin Gin Met Tyr Trp He 

260 265 270 

Leu Asn Gly Ala Glu Phe Arg Lys Ser He Ala He Val His Lys Phe 

275 280 285 

Ala Asp His Tyr Val Gin Lys Ala Leu Glu Leu Thr Asp Asp Asp Leu 

290 295 300 

Gin Lys Gin Asp Gly Tyr Val Phe Leu Tyr Glu Leu Ala Lys Gin Thr 
305 310 315 320 

Arg Asp Pro Lys Val Leu Arg Asp Gin Leu Leu Asn He Leu Val Ala 

325 330 335 

Gly Arg Asp Thr Thr Ala Gly Leu Leu Ser Phe Val Phe Tyr Glu Leu 

340 345 350 

Ser Arg Asn Pro Glu Val Phe Ala Lys Leu Arg Glu Glu Val Glu Asn 

355 360 365 

Arg Phe Gly Leu Gly Glu Glu Ala Arg Val Glu Glu He Ser Phe Glu 

370 375 380 

Ser Leu Lys Ser Cys Glu Tyr Leu Lys Ala Val He Asn Glu Ala Leu 
385 390 395 400 

Arg Leu Tyr Pro Ser Val Pro His Asn Phe Arg Val Ala Thr Arg Asn 

405 410 415 

Thr Thr Leu Pro Arg Gly Gly Gly Lys Asp Gly Cys Ser Pro He Val 

420 425 430 

Val Lys Lys Gly Gin Val Val Met Tyr Thr Val He Gly Thr His Arg 

435 440 445 

Asp Pro Ser He Tyr Gly Ala Asp Ala Asp Val Phe Arg Pro Glu Arg 

450 455 460 

Trp Phe Glu Pro Glu Thr Arg Lys Leu Gly Trp Ala Tyr Val Pro Phe 
465 470 475 480 

Asn Gly Gly Pro Arg He Cys Leu Gly Gin Gin Phe Ala Leu Thr Glu 

485 490 495 

Ala Ser Tyr Val Thr Val Arg Leu Leu Gin Glu Phe Gly Asn Leu Ser 

500 505 510 

Leu Asp Pro Asn Ala Glu Tyr Pro Pro Lys Leu Gin Asn Thr Leu Thr 

515 520 525 

Leu Ser Leu Phe Asp Gly Ala Asp Val Arg Met Phe 
530 535 540 

(2) INFORMATION FOR SEQ ID NO: 100: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 517 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNBSS : single 

(D) TOPOLOGY: unJmovni 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100: 

Met He Glu Gin Leu Leu Glu Tyr Trp Tyr Val Val Val Pro Val Leu 

15 10 15 

Tyr He He Lys Gin Leu Leu Ala Tyr Thr Lys Thr Arg Val Leu Met 

20 25 30 

Lys Lys Leu Gly Ala Ala Pro Val Thr Asn Lys Leu Tyr Asp Asn Ala 
35 40 45 
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Phe Gly lie 


Val 


Asn 


Gly 


Trp 


Lys 


50 








55 




Arg Ala Gin 


Glu 


Tyr 


Asn 


Asp 


Tyr 


65 






70 






Ser Val Gly 


Thr 


Tyr 


Val 


Ser 


He 






85 








Thr Lys Asp 


Pro 


Glu 


Asn 


He 


Lys 




100 










Asp Phe Ser 


Leu 


Gly 


Lys 


Arg 


His 


115 










120 


Asp Gly lie 


Phe 


Thr 


Leu 


Asp 


Gly 


130 








135 




Met Leu Arg 


Pro 


Gin 


Phe 


Ala 


Arg 


145 






150 






Leu Glu Pro 


His 


Phe 


Gin 


Leu 


Leu 






165 








Gly Glu Tyr 


Phe 


Asp 


He 


Gin 


Glu 




180 










Ser Ala Thr 


Glu 


Phe 


Leu 


Phe 


Gly 


195 










200 


Glu Ser lie 


Gly 


lie 


Asn 


Gin 


Asp 


210 








215 




Asp Phe Ala 


Glu 


Ser 


Phe 


Asn 


Lys 


225 






230 






Thr Leu Val 


Gin 


Thr 


Phe 


Tyr 


Trp 






245 








Asp Cys Thr 


Lys 


Ijeu 


Val 


His 


Lys 




260 










Ala Leu Asp 


Ala 


Ser 


Pro 


Glu 


Glu 


275 










280 


Phe Leu Tyr 


Glu 


Leu 


Val 


Lys 


Gin 


290 








295 




Asp Gin Ser 


I«eu 


Asn 


He 


Leu 


Leu 


305 






310 






Leu Leu Ser 


Phe 


Ala 


Val 


Phe 


Glu 






325 








Ala Lys Leu 


Arg 


Glu 


Glu 


He 


Glu 




340 










Ser Arg Val 


Glu 


Glu 


He 


Thr 


Phe 


355 










360 


Leu Lys Ala 


Phe 


Leu 


Asn 


Glu 


Thr 


370 








375 




Arg Asn Phe 


Arg 


He 


Ala 


Thr 


Lys 


385 






390 






Gly Ser Asp 


Gly 


Thr 


Ser 


Pro 


He 






405 








Ser Tyr Gly 


He 


Asn 


Ser 


Thr 


His 




420 










Asp Ala Ala 


Glu 


Phe 


Arg 


Pro 


Glu 


435 










440 


Lys Leu Gly 


Trp 


Ala 


Tyr 


Leu 


Pro 


450 








455 




Leu Gly Gin 


Gin 


Phe 


Ala 


Leu 


Thr 


465 






470 






Leu Val Gin 


Glu 


Phe 


Ser 


His 


Val 



485 



Ala 


Leu 


Gin 


Phe 
60 


Lys 


Lys 


Glu 


Gly 


Lys 


Phe 


Asp 
75 


His 


Ser 


Lys 


Asn 


Pro 
80 


Leu 


Phe 
90 


Gly 


Thr 


Arg 


He 


Val 
95 


Val 


Ala 


He 


Leu 


Ala 


Thr 


Gin 


Phe 


Gly 


105 










110 






Thr 


Leu 


Phe 


Lys 


Pro 

125 


Leu 


Leu 


Gly 


Glu 


Gly 


Trp 


Lys 
140 


His 


Ser 


Arg 


Ala 


Glu 


Gin 


Val 
155 


Ala 


His 


Val 


Thr 


Ser 
160 


Lys 


Lys 
170 


His 


He 


Leu 


Lys 


His 
175 


Lys 


Leu 


Phe 


Phe 


Arg 


Phe 


Thr 


Val 


Asp 


185 










190 






Glu 


Ser 


Val 


His 


Ser 
205 


Leu 


Lys 


Asp 


Asp 


He 


Asp 


Phe 
220 


Ala 


Gly 


Arg 


Lys 


Ala 


Gin 


Glu 
235 


Tyr 


Leu 


Ala 


He 


Arg 
240 


Leu 


Val 
250 


Asn 


Asn 


Lys 


Glu 


Phe 
255 


Arg 


Phe 


Thr 


Asn 


Tyr 


Tyr 


Val 


Gin 


Lys 


265 










270 






Leu 


Glu 


Lys 


Gin 


Ser 
285 


Gly 


Tyr 


Val 


Thr 


Arg 


Asp 


Pro 
300 


Asn 


Val 


Leu 


Arg 


Ala 


Gly 


Arg 
315 


Asp 


Thr 


Thr 


Ala 


Gly 
320 


Leu 


Ala 
330 


Arg 


His 


Pro 


Glu 


He 
335 


Trp 


Gin 


Gin 


Phe 


Gly 


Leu 


Gly 


Glu 


Asp 


345 










350 






Glu 


Ser 


Leu 


Lys 


Arg 
365 


Cys 


Glu 


Tyr 


Leu 


Arg 


He 


Tyr 

380 


Pro 


Ser 


Val 


Pro 


Asn 


Thr 


Thr 
395 


Leu 


Pro 


Arg 


Gly 


Gly 
400 


Leu 


He 
410 


Gin 


Lys 


Gly 


Glu 


Ala 
415 


Val 


Leu 


Asp 


Pro 


Val 


Tyr 


Tyr 


Gly 


Pro 


425 










430 






Arg 


Trp 


Phe 


Glu 


Pro 
445 


Ser 


Thr 


Lys 


Phe 


Asn 


Gly 


Gly 
460 


Pro 


Arg 


He 


Cys 


Glu 


Ala 


Gly 
475 


Tyr 


Val 


Leu 


Val 


Arg 
480 


Arg 


Leu 
490 


Asp 


Pro 


Asp 


Glu 


Val 
495 


Tyr 
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Pro Pro Lys Arg Leu Thr Asn Leu Thr Met Cys Leu Gin Asp Gly Ala 

500 505 510 

lie Val Lys Phe Asp 
515 

(2) INFORMATION FOR SEQ ID NO: 101: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 517 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 



Met 


He 


Glu 


Gin 


He 


Leu 


Glu 


Tyr 


Trp Tyr 


He 


Val Val Pro 


Val 


Leu 


1 








5 










10 






15 




Tyr 


He 


He 


Lys 


Gin 


Leu 


He 


Ala 


Tyr Ser 


Lys 


Thr Arg Val 


Leu 


Met 








20 










25 






30 






Lys 


Gin 


Leu 


Gly 


Ala 


Ala 


Pro 


He 


Thr Asn 


Gin 


Leu Tyr Asp 


Asn 


Val 






35 










40 








45 






Phe 


Gly 
50 


He 


Val 


Asn 


Gly 


Trp 
55 


Lys 


Ala 


Leu 


Gin 


Phe Lys Lys 
60 


Glu 


Gly 


Arg 


Ala 


Gin 


Glu 


Tyr 


Asn 


Asp 


His 


Lys 


Phe 


Asp 


Ser Ser Lys 


Asn 


Pro 


65 










70 










75 






80 


Ser 


Val 


Gly 


Thr 


Tyr 
85 


Val 


Ser 


He 


Leu 


Phe 
90 


Gly 


Thr Lys He 


Val 
95 


Val 


Thr 


Lys 


Asp 


Pro 
100 


Glu 


Asn 


He 


Lys 


Ala 

105 


He 


Leu 


Ala Thr Gin 
110 


Phe 


Gly 


Asp 


Phe 


Ser 
115 


Leu 


Gly 


Lys 


Arg 


His 
120 


Ala 


Leu 


Phe 


Lys Pro Leu 
125 


Leu 


Gly 


Asp 


Gly 


He 


Phe 


Thr 


Leu 


Asp 


Gly 


Glu Gly 


Txp 


Lys His Ser 


Arg 


Ser 




130 










135 










140 






Met 


Leu 


Arg 


Pro 


Gin 


Phe 


Ala 


Arg 


Glu Gin 


Val 


Ala His Val 


Thr 


Ser 


145 










150 










155 






160 


Leu 


Glu 


Pro 


His 


Phe 


Gin 


Leu 


Leu 


Lys Lys 


His 


He Leu Lys 


His 


Lya 










165 










170 






175 




Gly 


Glu 


Tyr 


Phe 
180 


Asp 


He 


Gin 


Glu 


Leu 
185 


Phe 


Phe 


Arg Phe Thr 

190 


val 


Asp 


Ser 


Ala 


Thr 
195 


Glu 


Phe 


Leu 


Phe 


Gly 
200 


Glu 


Ser 


Val 


His Ser Leu 
205 


Lys 


Asp 


Glu 


Thr 
210 


He 


Gly 


He 


Asn 


Gin 
215 


Asp 


Asp 


He 


Asp 


Phe Ala Gly 
220. 


Arg 


Lys 


Asp 


Phe 


Ala 


Glu 


Ser 


Phe 


Asn 


Lys 


Ala 


Gin 


Glu 


Tyr Leu Ser 


He 


Arg 


225 










230 










235 






240 


He 


Leu 


Val 


Gin 


Thr 
245 


Phe 


Tyr 


Trp 


Leu 


He 
250 


Asn 


Asn Lys Glu 


Phe 
255 


Arg 


Asp 


Cys 


Thr 


Lys 
260 


Leu 


Val 


His 


Lys 


Phe 

265 


Thr 


Asn 


Tyr Tyr Val 
270 


Gin 


Lys 


Ala 


Leu 


Asp 
275 


Ala 


Thr 


Pro 


Glu 


Glu 
280 


Leu 


Glu 


Lys 


Gin Gly Gly 
285 


Tyr 


Val 


Phe 


Leu 


Tyr 


Glu 


Leu 


Val 


Lys 


Gin 


Thr Arg 


Asp 


Pro Lys Val 


Leu 


Arg 




290 










295 










300 






Asp 


Gin 


Ser 


Leu 


Asn 


He 


Leu 


Leu 


Ala Gly 


Arg 


Asp Thr Thr 


Ala 


Gly 


305 










310 










315 






320 


Leu 


Leu 


Ser 


Phe 


Ala 
325 


Val 


Phe 


Glu 


Leu 


Ala 
330 


Arg 


Asn Pro His 


He 
335 


Trp 
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Ala 


Lys 


Leu 


Arg 
340 


Glu 


Glu 


He 


Glu 


Gin Gin Phe 
345 


Gly Leu Gly 
350 


Glu 


Asp 


Ser 


Arg 


Val 
355 


Glu 


Glu 


He 


Thr 


Phe 
360 


Glu Ser Leu 


Lys Arg Cys 
365 


Glu 


Tyr 


Leu 


Lys 
370 


Ala 


Phe 


Leu 


Asn 


Glu 
375 


Thr 


Leu Arg Val 


Tyr Pro Ser 
380 


Val 


Pro 


Arg 


Asn 


Phe 


Arg 


He 


Ala 


Thr 


Lys 


Asn Thr Thr 


Leu Pro Arg 


Gly 


Gly 


385 










390 






395 






400 


Gly 


Pro 


Asp 


Gly 


Thr 
405 


Gin 


Pro 


He 


Leu He Gin 

410 


Lys Gly Glu 


Gly 
415 


Val 


Ser 


Tyr 


Gly 


He 
420 


Asn 


Ser 


Thr 


His 


Leu Asp Pro 
425 


Val Tyr Tyr 
430 


Gly 


Pro 


Asp 


Ala 


Ala 
435 


Glu 


Phe 


Arg 


Pro 


Glu 
440 


Arg Trp Phe 


Glu Pro Ser 
445 


Thr 


Arg 


Lys 


Leu 
450 


Gly 


Trp 


Ala 


Tyr 


Leu 
455 


Pro 


Phe Asn Gly 


Gly Pro Arg 
460 


He 


Cys 


Leu 


Gly 


Gin 


Gin 


Phe 


Ala 


Leu 


Thr 


Glu Ala Gly 


Tyr Val Leu 


Val 


Arg 


465 










470 






475 






480 


lieu 


Val 


Gin 


Glu 


Phe 
485 


Ser 


His 


He 


Arg Leu Asp 
490 


Pro Asp Glu 


Val 
495 


Tyr 


Pro 


Pro 


Lys 


Arg 
500 


Leu 


Thr 


Aan 


Leu 


Thr Met Cys 
505 


Leu Gin Asp 
510 


Gly 


Ala 


He 


Val 


Lys 
515 


Phe 


Asp 

















(2) INFORMATION FOR SEQ ID NO: 102: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 512 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNBSS: single 

(D) TOPOLOGY: unknown 
(ii) MOLKCDLE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 



Met 


Leu 


Asp 


Gin 


He 


Leu 


His 


Tyr 


Trp Tyr He 


Val Leu Pro 


Leu 


Leu 


1 








5 








10 




15 




Ala 


He 


He 


Asn 


Gin 


He 


Val 


Ala 


His Val Arg 


Thr Asn Tyr 


Leu 


Met 








20 










25 


30 






Lys 


Lys 


Leu 


Gly 


Ala 


Lys 


Pro 


Phe 


Thr His Val 


Gin Arg Asp 


Gly 


Trp 






35 










40 




45 






Leu 


Gly 


Phe 


Lys 


Phe 


Gly 


Arg 


Glu 


Phe Leu Lys 


Ala Lys Ser 


Ala 


Gly 




50 










55 






60 






Arg 


Leu 


Val 


Asp 


Leu 


He 


He 


Ser 


Arg Phe His 


Asp Asn Glu 


Asp 


Thr 


65 










70 






75 






80 


Phe 


Ser 


Ser 


Tyr 


Ala 


Phe 


Gly 


Asn 


His Val Val 


Phe Thr Arg 


Asp 


Pro 










85 








90 




95 




Glu 


Asn 


He 


Lys 


Ala 


Leu 


Leu 


Ala 


Thr Gin Phe 


Gly Asp Phe 


Ser 


Leu 








100 










105 


110 






Gly 


Ser 


Arg 


Val 


Lys 


Phe 


Phe 


Lys 


Pro Leu Leu 


Gly Tyr Gly 


He 


Phe 






115 










120 




125 






Thr 


Leu 


Asp 


Ala 


Glu 


Gly 


Trp 


Lys 


His Ser Arg 


Ala Met Leu 


Arg 


Pro 




130 










135 






140 






Gin 


Phe 


Ala 


Arg 


Glu 


Gin 


Val 


Ala 


His Val Thr 


Ser Leu Glu 


Pro 


His 


145 










150 






155 






160 


Phe 


Gin 


lieu 


Leu 


Lys 


Lys 


His 


He 


Leu Lys His 


Lys Gly Glu 


Tyr 


Phe 










165 








170 




175 
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Asp 


xxe 


uxn 


V9J.U 


Leu 


Fne 


irne 


Arg 


pne Tcr vax 


Asp Ser Ala 


Thr 


Glu 




















190 






DVia 


lieu 










Vox 




OCX Ajeu iiys 


ASp \9XU %9±U 


x±e 


uxy 
















2uO 




2D9 






Tyr 


Asp 


tHir 


Lys 


Asp 


neu 


Ser 


%alXL 


Glu Arg Arg 


Arg Fne Axa 


Asp 


Ai.a 




210 










215 






220 






pne 


Asn 


Lys 


Ser 


Gin 


Val 


Tyr 


Val 


Ala Thr Arg 


Val Ala Leu 


Gin 


Asn 












2 JO 






2 J3 






240 


Leu 


Tyr 


Trp 


Leu 


Val 


Asn 


Asn 


Lys 


Glu Phe Lys 


Glu Cys Asn 


Asp 


He 










245 








250 




255 




Val 


His 


Lys 


Phe 


Thr 


Asn 


Tyr 


Tyr 


Val Gin Lys 


Ala Leu Asp 


Ala 


Thr 








260 










265 


270 






Pro 


Glu 


Glu 


Leu 


Glu 


Ly9 


Gin 


Gly 


Gly Tyr Val 


Phe Leu Tyr 


Glu 


Leu 






275 










280 




285 






Val 


Lys 


Gin 


Thr 


Arg 


Asp 


Pro 


Lys 


Val Leu Arg 


Asp Gin Ser 


Leu 


Asn 




290 










295 






300 






lie 


Leu 


Leu 


Ala 


Gly 


Arg 


Asp 


Thr 


Thr Ala Gly 


lieu Leu Ser 


Phe 


Ala 


305 










310 






315 






320 


Val 


Phe 


Glu 


Leu 


Ala 


Arg 


Asn 


Pro 


His He Trp 


Ala Lys Leu 


Arg 


Glu 










325 








330 




335 




Glu 


He 


Glu 


Gin 


Gin 


Phe 


Gly 


Leu 


Gly Glu Asp 


Ser Arg Val 


Glu 


Glu 








340 










345 


350 






He 


Thr 


Phe 


Glu 


Ser 


Leu 


Lys 


Arg 


Cys Glu Tyr 


Leu Lys Ala 


Val 


Leu 






355 










360 




365 






Asn 


Glu 


Thr 


Leu 


Arg 


Leu 


His 


Pro 


Ser Val Pro 


Arg Asn Ala 


Arg 


Phe 




370 










375 






380 






Ala 


He 


Lys 


Asp 


Thr 


Thr 


Ijeu 


Pro 


Arg Gly Gly 


Gly Pro Asn 


Gly 


Lys 


385 










390 






395 






400 


Asp 


Pro 


He 


Leu 


He 


Arg 


Lys 


Asp 


Glu Val Val 


Gin Tyr Ser 


He 


Ser 










405 








410 




415 




Ala 


Thr 


Gin 


Thr 


Asn 


Pro 


Ala 


Tyr 


Tyr Gly Ala 


Asp Ala Ala 


Asp 


Phe 








420 










425 


430 






Arg 


Pro 


Glu 


Arg 


Trp 


Phe 


Glu 


Pro 


Ser Thr Arg 


Asn Leu Gly 


Trp 


Ala 






435 










440 




445 






Phe 


Leu 


Pro 


Phe 


Asn 


Gly 


Gly 


Pro 


Arg He Cys 


Leu Gly Gin 


Gin 


Phe 




450 










455 






460 






Ala 


Leu 


Thr 


Glu 


Ala 


Gly 


Tyr 


Val 


Leu Val Arg 


Leu Val Gin 


Glu 


Phe 


465 










470 






475 






480 


Pro 


Asn 


Leu 


Ser 


Gin 


Asp 


Pro 


Glu 


Thr Lys Tyr 


Pro Pro Pro 


Arg 


Leu 










485 








490 




495 




Ala 


His 


Leu 


Thr 


Met 


Cys 


Leu 


Phe 


Asp Gly Ala 


His Val Lys 


Met 


Ser 








500 










505 


510 







(2) INFORMATION FOR SBQ ID NO: 103: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 512 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 
Met Leu Asp Gin He Phe His Tyr Trp Tyr He Val Leu Pro 
1 5 10 

Val He He Lys Gin He Val Ala His Ala Arg Thr Asn Tyr 
20 25 30 



Leu Leu 
15 

Leu Met 
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Lys 


Lys 


Leu 


Gly 


Ala 


Lys 


Pro 


Phe 


Thr 


His 


Val 


Gin 


Leu 


Asp 


Gly 


Trp 






35 










40 










45 








Phe 


Gly 


Phe 


Lys 


Phe 


Gly 


Arg 


Glu 


Phe 


Leu 


Lys 


Ala 


Lys 


Ser 


Ala 


Gly 




50 










55 










60 










Arg 


Gin 


Val 


Asp 


Leu 


He 


He 


Ser 


Aicg 


Phe 


His 


Asp 


Asn 


Glu 


Asp 


Thr 


65 










70 










75 










80 


Phe 


Ser 


Ser 


Tyr 


Ala 


Phe 


Gly 


Asn 


His 


Val 


Val 


Phe 


Thr 


Arg 


Asp 


Pro 










85 










90 










95 




Glu 


Asn 


He 


Lys 


Ala 


Leu 


Leu 


Ala 


Thr 


Gin 


Phe 


Gly 


Asp 


Phe 


Ser 


Leu 








100 










105 










110 






Gly 


Ser 


Arg 


Val 


Lys 


Phe 


Phe 


Lys 


Pro 


Leu 


Leu 


Gly 


Tyr 


Gly 


He 


Phe 






115 










120 










125 








Thr 


Leu 


Asp 


Gly 


Glu 


Gly 


Trp 


Lys 


His 


Ser 


Arg 


Ala 


Met 


Leu 


Arg 


Pro 




130 










135 










140 










Gin 


Phe 


Ala 


Arg 


Glu 


Gin 


Val 


Ala 


His 


Val 


Thr 


Ser 


Leu 


Glu 


Pro 


His 


145 










150 










155 










160 


Phe 


Gin 


Leu 


Leu 


Lys 


Lys 


His 


He 


Leu 


Lys 


His 


Lys 


Gly 


Glu 


Tyr 


Phe 










165 










170 










175 




Asp 


He 


Gin 


Glu 


Leu 


Phe 


Phe 


Arg 


Phe 


Thr 


Val 


Asp 


Ser 


Ala 


Thr 


Glu 








180 










185 










190 






Phe 


Ijeu 


Phe 


Gly 


Glu 


Ser 


Val 


His 


Ser 


Leu 


Arg 


Asp 


Glu 


Glu 


He 


Gly 






195 










200 










205 








Tyr 


Asp 


Thr 


Lys 


Asp 


Met 


Ala 


Glu 


Glu 


Arg 


Arg 


Lys 


Phe 


Ala 


Asp 


Ala 




210 










215 










220 










Phe 


Asn 


Lys 


Ser 


Gin 


Val 


Tyr 


Leu 


Ser 


Thr 


Arg 


val 


Ala 


Leu 


Gin 


Thr 


225 










230 










235 










240 


Leu 


Tyr 


Trp 


Leu 


Val 


Asn 


Asn 


Lys 


Glu 


Phe 


Lys 


Glu 


Cys 


Asn 


Asp 


He 










245 










250 










255 




Val 


His 


Lys 


Phe 


Thr 


Asn 


Tyr 


Tyr 


Val 


Gin 


Lys 


Ala 


Leu 


Asp 


Ala 


Thr 








260 










265 










270 






Pro 


Glu 


Glu 


Leu 


Glu 


Lys 


Gin 


Gly 


Gly 


Tyr 


Val 


Phe 


Leu 


Tyr 


Glu 


Leu 






275 










280 










285 








Ala 


Lys 


Gin 


Thr 


Lys 


Asp 


Pro 


Asn 


Val 


Leu 


Arg 


Asp 


Gin 


Ser 


Leu 


Asn 




290 










295 










300 










lie 


Leu 


Leu 


Ala 


Gly 


Arg 


Asp 


Thr 


Thr 


Ala 


Gly 


Leu 


Leu 


Ser 


Phe 


Ala 


305 










310 










315 










320 


Val 


Phe 


Glu 


lieu 


Ala 


Arg 


Asn 


Pro 


His 


He 


Trp 


Ala 


Lys 


Leu 


Arg 


Glu 










325 










330 










335 




Glu 


He 


Glu 


Ser 


His 


Phe 


Gly 


Leu 


Gly 


Glu 


Asp 


Ser 


Arg 


Val 


Glu 


Glu 








340 










345 










350 






lie 


Thr 


Phe 


Glu 


Ser 


Leu 


Lys 


Arg 


Cys 


Glu 


Tyr 


Leu 


Lys 


Ala 


Val 


Leu 






355 










360 










365 








Asn 


Glu 


Thr 


I«eu 


Arg 


Leu 


His 


Pro 


Ser 


Val 


Pro 


Arg 


Asn 


Ala 


Arg 


Phe 




370 










375 










380 










Ala 


He 


Lys 


Asp 


Thr 


Thr 


Leu 


Pro 


Arg 


Gly 


Gly 


Gly 


Pro 


Asn 


Gly 


Lys 


385 










390 










395 










400 


Asp 


Pro 


He 


Leu 


He 


Airg 


Lys 


Asn 


Glu 


Val 


Val 


Gin 


Tyr 


Ser 


He 


Ser 










405 










410 










415 




Ala 


Thr 


Gin 


Thr 


Asn 


Pro 


Ala 


Tyr 


Tyr 


Gly 


Ala 


Asp 


Ala 


Ala 


Asp 


Phe 








420 










425 










430 






Arg 


Pro 


Glu 


Arg 


Trp 


Phe 


Glu 


Pro 


Ser 


Thr 


Arg 


Asn 


Ijeu 


Gly 


Trp 


Ala 






435 










440 










445 








Tyr 


Leu 


Pro 


Phe 


Asn 


Gly 


Gly 


Pro 


Arg 


He 


Cys 


Leu 


Gly 


Gin 


Gin 


Phe 




450 










455 










460 










Ala 


Leu 


Thr 


Glu 


Ala 


Gly 


Tyr 


Val 


Leu 


Val 


Arg 


Leu 


Val 


Gin 


Glu 


Phe 


465 










470 










475 










480 
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Pro Ser I*eu Ser Gin Asp Pro Glu Thr Glu Tyr Pro Pro Pro Arg I^eu 

485 490 495 

Ala His lieu Thr Met Cys l«eu Phe Asp Gly Ala Tyr Val Lys Net Gin 

500 505 510 



INFOIU4ATION FOR SEQ ID NO: 104: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 499 amino acids ^ 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unimown 



(ii) MOLECOLE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104: 



Net 


Ala 


He 


Ser 


Ser 


Leu 


Leu 


Ser 


Trp 


Asp Val 


He Cys 


Val 


Val 


Phe 


1 








5 










10 






15 




He 


Cys 


Val 


Cys 


Val 


Tyr 


Phe 


Gly 


Tyr 


Glu Tyr 


Cys Tyr 


Thr 


Lys 


Tyr 








20 










25 






30 






Leu 


Net 


His 


Lys 


His 


Gly 


Ala 


Arg 


Glu 


He Glu 


Asn Val 


He 


Asn 


Asp 






35 










40 






45 








Gly 


Phe 


Phe 


Gly 


Phe 


Arg 


Leu 


Pro 


Leu 


Leu Leu 


Net Arg 


Ala 


Ser 


Asn 




50 










55 








60 








Glu 


Gly 


Arg 


Leu 


He 


Glu 


Phe 


Ser 


Val 


Lys Arg 


Phe Glu 


Ser 


Ala 


Pro 


65 










70 








75 








80 


His 


Pro 


Gin 


Asn 


Lys 


Thr 


Leu 


Val 


Asn 


Arg Ala 


Leu Ser 


Val 


Pro 


Val 










85 










90 






95 




He 


Leu 


Thr 


Lys 


Asp 


Pro 


Val 


Asn 


He 


Lys Ala 


Net Leu 


Ser 


Thr 


Gin 








100 










105 






110 






Phe 


Asp 


Asp 


Phe 


Ser 


Leu 


Gly 


Leu 


Arg 


Leu His 


Gin Phe 


Ala 


Pro 


Leu 






115 










120 






125 








Leu 


Gly 


Lys 


Gly 


He 


Phe 


Thr 


Ijeu 


Asp 


Gly Pro 


Glu Trp 


Lys 


Gin 


Ser 




130 










135 








140 








Arg 


Ser 


Net 


Leu 


Arg 


Pro 


Gin 


Phe 


Ala 


Lys Asp 


Arg Val 


Ser 


His 


He 


145 










150 








155 








160 


Leu 


Asp 


Leu 


Glu 


Pro 


His 


Phe 


Val 


Leu 


Leu Arg 


Lys His 


He 


Asp 


Gly 










165 










170 






175 




His 


Asn 


Gly 


Asp 


Tyr 


Phe 


Asp 


He 


Gin 


Glu Leu 


Tyr Phe 


Arg 


Phe 


Ser 








180 










185 






190 






Met 


Asp 


Val 


Ala 


Thr 


Gly 


Phe 


Leu 


Phe 


Gly Glu 


Ser Val 


Gly 


Ser 


Leu 






195 










200 






205 








Lys 


Asp 


Glu 


Asp 


Ala 


Arg 


Phe 


Leu 


Glu 


Ala Phe 


Asn Glu 


Ser 


Gin 


Lys 




210 










215 








220 








Tyr 


Leu 


Ala 


Thr 


Arg 


Ala 


Thr 


Leu 


His 


Glu Leu 


Tyr Phe 


Leu 


Cys 


Asp 


225 










230 








235 








240 


Gly 


Phe 


Arg 


Phe 


Arg 


Gin 


Tyr 


Asn 


Lys 


Val Val 


Arg Lys 


Phe 


Cys 


Ser 










245 










250 






255 




Gin 


Cys 


Val 


His 


Lys 


Ala 


Leu 


Asp 


Val 


Ala Pro 


Glu Asp 


Thr 


Ser 


Glu 








260 










265 






270 






Tyr 


Val 


Phe 


Leu 


Arg 


Glu 


Leu 


Val 


Lys 


His Thr 


Arg Asp 


Pro 


Val 


Val 






275 










280 






285 








Leu 


Gin 


Asp 


Gin 


Ala 


Leu 


Asn 


Val 


Leu 


Leu Ala 


Gly Arg 


Asp 


Thr 


Thr 




290 










295 








300 








Ala 


Ser 


Leu 


Leu 


Ser 


Phe 


Ala 


Thr 


Phe 


Glu Leu 


Ala Arg 


Asn 


Asp 


His 


305 










310 








315 








320 


Met 


Trp 


Arg 


Lys 


Leu 


Arg 


Glu 


Glu 


Val 


He Leu 


Thr Net 


Gly 


Pro 


Ser 










325 










330 






335 
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Ser 


Asp 


Glu 


He 


Thr 


Val 


Ala 


Gly 


Leu Lys Ser Cys 


Arg Tyr Leu 


Lys 








340 










345 


350 




Ala 


He 


Leu 


Asn 


Glu 


Thr 


Leu 


Arg 


Leu Tyr Pro Ser 


Val Pro Arg 


Asn 






355 










360 




365 




Ala 


Arg 


Phe 


Ala 


Thr 


Arg 


Asn 


Thr 


Thr Leu Pro Arg 


Gly Gly Gly 


Pro 




370 










375 




380. 






Asp 


Gly 


Ser 


Phe 


Pro 


He 


lieu 


He 


Arg Lys Gly Gin 


Pro Val Gly 


Tyr 


385 










390 






395 




400 


Phe 


He 


Cys 


Ala 


Thr 


His 


Leu 


Asn 


Glu Lys Val Tyr 


Gly Asn Asp 


Ser 










405 








410 


415 




His 


Val 


Phe 


Arg 


Pro 


Glu 


Arg 


Trp 


Ala Ala Leu Glu 


Gly Lys Ser 


Leu 








420 










425 


430 




Gly 


Trp 


Ser 


Tyr 


Leu 


Pro 


Phe 


Asn 


Gly Gly Pro Arg 


Ser Cys Leu 


Gly 






435 










440 




445 




Gin 


Gin 


Phe 


Ala 


He 


Leu 


Glu 


Ala 


Ser Tyr Val Leu 


Ala Arg Leu 


Thr 




450 










455 




460 


• 




Gin 


Cys 


Tyr 


Thr 


Thr 


He 


Gin 


Leu 


Arg Thr Thr Glu 


Tyr Pro Pro 


Lys 


465 










470 






475 




480 


Lys 


Leu 


Val 


His 


Leu 


Thr 


Met 


Ser 


Leu Leu Asn Gly 


Val Tyr He 


Arg 










485 








490 


495 




Thr 


Arg 


Thr 



















(2) INFORMATION FOR SEQ ID NO: 105: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1712 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 105: 

GGTACCGAGC TCACGAGTTT TGGGATTTTC GAGTTT6GAT TGTTTCCTTT GTTGATTGAA 60 

TTGACGAAAC CAGAGGTTTT CAAGACA6AT AAGATTGGGT TTATCAAAAC GCAGTTTGAA 120 

ATATTCCA6T TGGTTTCCAA GATATCTTGA AGAAGAnGA CGATTTQAAA TTTGAAGAAG 180 

TGGAGAAGAT CT6GTTT66A TT6TTGGAQA ATTTCAAGAA TCTCAAGATT TACTCTAACO 240 

ACGGGTACAA CGAGAATTGT ATTGAATTGA TCAAGAACAT GATCTTGGTG TTACAGAACA 300 

TCAAGTTCTT GGACCAGACT GAGAATGCCA CAGATATACA AGGCGTCATG TGATAAAATG 360 

GATGAGATTT ATCCCACAAT TGAAGAAAGA GTTTATGGAA AGT6GTCAAC CAGAAGCTAA 420 

ACAGGAA6AA GCAAACGAAG AGGTGAAACA AGAA6AAGAA GGTAAATAAG TATTTTGTAT 480 

TATATAACAA ACAAAGTAAG GAATACAGAT TTATACAATA AATT6CCATA CTAGTCACGT 540 

GAGATATCTC ATCCATTCCC CAACTCCCAA GAAAAAAAAA AA6TGAAAAA AAAAATCAAA 600 

CCCAAAGATC AACCTCCCCA TCATCATCGT CATCAAACCC CCAGCTCAAT TCGCAATGGT 660 

TAGCACAAAA ACATACACAG AAAGGGCATC AGCACACCCC TCCAAG6TTG CCCAACGTTT 720 

ATTCCGCTTA ATGGAGTCCA AAAAGACCAA CCTCTGC6CC TCGATCGACG TGACCACAAC 780 

CGCCGAGTTC CTTTCGCTCA TCGACAA6CT CGGTCCCCAC ATCTGTCTCG T6AAGACGCA 840 

CATCGATATC ATCTCAGACT TCAGCTACGA GGGCACGATT GAGCCGTTGC TTGTGCTTGC 900 

AGAGCGCCAC GGGTTCTTGA TATTCGAGGA CAGGAAGTTT 6CTGATATCG GAAACACCGT 960 

GATGTTGCAG TACACCTCGG GGGTATACCG GATCGCGGCG TGGA6TGACA TCACGAACGC 1020 

GCACGGAGTG ACTGGGAA6G GCGTCGTTGA AGGGTTGAAA CGCGGTQCGG AGGGGGTAGA 1080 

AAA6GAAAGG GGCGTGTTGA TGTTGGCGGA GTTGTCGAGT AAAGGCTCGT TGGOGCATGG 1140 

TGAATATACC CGTGAGACGA TCGAGATTGC GAAGAGTGAT CGGGTIGTTCG TGATTGGGTT 1200 

CATCGCGCAG CGGGACATGG GGGGTAGAGA AGAAGGGTTT GATTGGATCA TCATGACGCC 1260 

TGGTGTGGGG TTGGATGATA AAGGCGATGC GTTGGGCCAG CAGTATAGGA CT G TTGATGA 1320 

6GTGGTTCTG ACTGGTACCG ATGTGATTAT TGTCGGGAGA GGGTTGTTTG GAAAAGGAAG 1380 

AGACCCTGAG GTGGAGGGAA AGAGATACAG GGATGCTGGA T6GAAGGCAT ACTTGAA6A6 1440 

AACTGGTCAG TTAGAATTIAA TATTQTAATA TIATAGGTCTA TATACATACA CTAAGCTTCT 1500 

AGGACGTCAT T6TAGTCTTC GAAGTTGTCT GCTAGTTTAG TTCTCATGAT TTCGAAAACC 1560 
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AATAACX3CAA TGGAT6TA6C AGGGATGGTG GTTACT606T TCCTGACAAH CCCAGA6TAC 1620 
GCCGCCTCAA ACCACGTCAC ATTCGCCCTT TGCTTCIATCC GCATCACTT6 CTTGAAGOTA 1680 
TCCACX5TACG AGTTGTAATA CACCTTGAAG AA 1712 

(2) IMFOHNATION FOR SEQ ID N0:106: 
(1) SEQDBNCB CHARACTERISTICS: 

(A) LENGTH: 267 amino acids 

(B) TYPE: amino acid 

(C) STRANDBDNRSS: single 

(D) TOPOIOGY: linear 
(ii) MOIiECOLE TYPE: unknown 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 



Met 


Val 


Ser 


Thr 


Lys 




Tvr 


Thr 


Glu 


^^ACi aB& J^l,t3L HXd IrX^O OCX 


1 








5 










10 15 




Val 


Ala 


Gin 




lieu 


Phe 






Mof* Gill Qait TaVa T>iro "Ph^ Aan 
n«8b wxu oBjb ujfo i iix mou 








20 










25 


30 


Ijeu 




Ala 


Ser 


lie 


Asp 


Val 


xin 


* lit 


fhi^ &1 a Gl 11 P)^o Taii Qav T ah 






35 










40 




45 


lie 


Asp 


Lys 


I«eu 




Pro 


His 


He 




Taii Val Tato n'hii^ Y¥'4 a Tl a Acm 
Jijcu vcAX xux nxo xxc Mop 




50 










55 






60 


lie 


lie 




Ann 








Gill 


Glv 


XUX XXC UXU trJ^O JjtSu XicU VoIX 


65 










70 












Ala 


Gill 






Glv 




T All 




trxus wxu Mop xiys irne Jixa 










85 










90 95 




lie 


uxy 


Ann 




vox 




T*A11 
JJCU 


G1 n 

V9XU 


nOiv* Cav* Glv \/ol Avw 
xj^x xux OCX uxy^ vax xyx j*x^ 








100 










105 


110 






A-Lci 


Trp 


Car* 


Asp 




jixur 


Asn 


jmXa joxB vixy vax inr urxy jjys 






115 










120 




125 


Gly 


Val 


Val 


Glu 


Gly 


Leu 


Lys 


Arg 


Gly 


Ala Glu Gly Val Glu Lys Glu 




130 










135 






140 


Arg 


Gly 


Val 


Leu 


Net 


Leu 


Ala 


Glu 


Leu 


Ser Ser Lys Gly Ser Leu Ala 


145 










150 








155 160 


His 


Gly 


Glu 


Tyr 


Thr 


Arg 


Glu 


Thr 


He 


Glu He Ala Lys Ser Asp Arg 










165 










170 175 


Glu 


Phe 


Val 


He 


Gly 


Phe 


He 


Ala 


Gin 


Arg Asp Met Gly Gly Arg Glu 








180 










185 


190 


Glu 


Gly 


Phe 


Asp 


Trp 


He 


He 


Met 


Thr 


Pro Gly Val Gly Leu Asp Asp 






195 










200 




205 


Lys 


Gly 


Asp 


Ala 


Leu 


Gly 


Gin 


Gin 


Tyr 


Arg Thr Val Asp Glu Val Val 




210 










215 






220 


Leu 


Thr 


Gly 


Thr 


Asp 


Val 


He 


He 


Val 


Gly Arg Gly Leu Phe Gly Lys 


225 










230 








235 240 


Gly 


Arg 


Asp 


Pro 


Glu 


Val 


Glu 


Gly 


Lys 


Arg Tyr Arg Asp Ala Gly Trp 










245 










250 255 


Lys 


Ala 


Tyr 


Leu 


Lys 


Arg 


Thr 


Gly 


Gin 


Leu Glu 








260 










265 





(2) INFORMATI(»l FOR SEQ ID NO: 107: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 473 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS : single 

(D) TOPOLOGY: linear 

(ii) MOLBCOLB TYPE: other nucleic acid 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 

GTCAAA6CAA ATTGTTGGCC CAAGCAGACT CTTGGACCAC CGTTGAATGG AACATAA6CC 60 

CA6CCCAACT TCTTAGTAGA TG6TTCAAAC CATCTTTCT6 GTCTGAAQTC GTTA6CGTCC 120 

TTACCGTA6T ATTCTTCCAA ACGGTGGGTC TTGTAGACAA CGTAA6CAAC ACT6GA6CCT 180 

TTAGGAAT6T AGATTGGGTC GGTACCGTTA GCACCACCAC CTCTTGGCAA AGTGGTGTCT 240 

CT66T6GCG6 TTCTAAAGTT GACAGGAACA GATGGGTACA TACGCAAGGT TTCGTTAAG6 300 

ATAGCCTTCA ACTATTCACA TCTCTTCAAG GCTTCGAAAG TAATTTCTTC AACGCGGGAG 360 

TCTTCACCAA CACCAAAGTT AACTTCGATT TCTTCTCTCA ACTTGGACCA CATCTCTGGG 420 

TGTCTA0CCA ATTCAAACAA AGCAAAGGAC AACAAACCCG CGGTGGTGTC TCT 473 
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